Ayumi's LLM Role Play & ERP Ranking (Version 3)
This ranking table contains a rating of different LLMs, which tries to determine which model is most suitable for (erotic) role playing (ERP) by using an automated benchmark. Unfortunately this automated benchmarks has it's limits, but the table can serve as a starting point for you to look for LLM models to try out.
- Other Rankings/Comparisons:
- BestERP Community ratings for AI character, role-playing and story-telling services and models
- Censorbench - Evaluate how censored one LLM is, by using another LLM
- The Crestfall Project - Leaderboard
- u/WolframRavenwolf on reddit
- LLM Comparison/Test: 2x 34B Yi (Dolphin, Nous Capybara) vs. 12x 70B, 120B, ChatGPT/GPT-4 by
u/WolframRavenwolf - Huge LLM Comparison/Test: 39 models tested (7B-70B + ChatGPT/GPT-4) by
u/WolframRavenwolf - Huge LLM Comparison/Test: Part II (7B-20B) Roleplay Tests by u/WolframRavenwolf
- New Model RP Comparison/Test (7 models tested) by u/WolframRavenwolf - reddit/r/LocalLLaMA
- Big Model Comparison/Test (13 models tested) by u/WolframRavenwolf - reddit/r/LocalLLaMA
- LLM Comparison/Test: 2x 34B Yi (Dolphin, Nous Capybara) vs. 12x 70B, 120B, ChatGPT/GPT-4 by
- Another LLM Roleplay Rankings - by AliCat and Trappu - https://rentry.co/ALLMRR
- HuggingFaceH4 - Open LLM Leaderboard
Interpretation Warning: Writing quality is not covered!
Disclaimer: This benchmark makes no statement about how well a LLM will be able to drive the story forward. It can also not determine coherency within a longer role play chat. The generated text quality is not tested for. For more information look in these sections: Known Flaws of the ALC-IQ and Known Flaws of the ERP Score
##################
The most up to date table and changelog you can find on my new landing page: http://ayumi.m8geil.de/
##################
Column | Description |
---|---|
ALC-IQ3 | The ALC-IQ3 is the 3rd version of the ALC-IQ. It tries to determine how well a model understands a character card. The higher the better. Best score is 100. |
ERP3 Score | The average ratio of lewd words vs. words in a response. The higher the better. |
Var Score | The lewd word variety score. It counts how many different lewd words occur in all ERP responses |
Updated: 2023-11-21 13:11:36 (UTC+01:00) Changelog
Note: For an interactive table look here: http://ayumi.m8geil.de/ayumi_bench_v3_results.html
Rank | Name | Size | Q | ALC-IQ3 | ERP3 Score | Var Score |
---|---|---|---|---|---|---|
1 | Neural Chat V3 16k 7B | 7B | Q8_0 | 89.33 | 30.92 | 572 |
2 | Neural Chat V3-1 7B | 7B | Q6_K | 88.18 | 30.42 | 468 |
3 | U Amethyst 20B | 20B | Q5_K_M | 88.86 | 30.95 | 455 |
4 | LLaMA-2 Ensemble v6 13B | 13B | Q5_K_M | 86.93 | 29.25 | 482 |
5 | Airoboros L2 2.2 70B | 70B | Q4_K_M | 88.20 | 29.16 | 459 |
6 | Synatra V0.3 RP 7B | 7B | Q8_0 | 82.72 | 35.15 | 453 |
7 | PsyMedRP V1 20B | 20B | Q5_K_M | 88.48 | 30.59 | 440 |
8 | Euryale Inverted L2 70B | 70B | Q4_K_M | 87.15 | 32.53 | 417 |
9 | ORCA LLaMA QLoRA 70B | 70B | Q4_K_M | 90.07 | 30.77 | 396 |
10 | Emerhyst 20B | 20B | Q5_K_M | 88.33 | 29.20 | 423 |
11 | Utopia 13B | 13B | Q5_K_M | 85.05 | 30.85 | 439 |
12 | StellarBright 70B | 70B | Q4_K_M | 88.56 | 30.36 | 404 |
13 | Synatra V0.3 RP 7B | 7B | Q4_K_M | 82.69 | 34.04 | 425 |
14 | Nethena 20B | 20B | Q5_K_M | 86.35 | 32.60 | 400 |
15 | Sheep Duck LLaMA 2 V1.1 70B | 70B | Q4_K_M | 89.24 | 31.23 | 377 |
16 | Synatra V0.3 RP AshhLimaRP Mistral 7B | 7B | Q5_KM | 83.33 | 33.20 | 418 |
17 | Sheep Duck LLaMA 2 13B | 13B | Q5_K_M | 87.83 | 30.39 | 400 |
18 | Nethena MLewd Xwin 23B | 23B | Q5_K_M | 83.30 | 33.98 | 405 |
19 | Stairolz 70B | 70B | Q4_K_S | 88.32 | 30.56 | 382 |
20 | Misted 7B | 7B | Q6_K | 86.00 | 32.54 | 379 |
21 | Upstage LLaMA Instruct 65B | 65B | Q4_K_M | 88.45 | 32.86 | 347 |
22 | Toppy M 7B | 7B | Q5_K_M | 89.30 | 32.81 | 336 |
23 | StableBeluga 2 70B | 70B | Q4_K_M | 87.51 | 29.39 | 391 |
24 | Xwin LM V0.1 70B | 70B | Q4_K_M | 88.54 | 31.02 | 362 |
25 | Zephyr Alpha 7B | 7B | Q5_K_M | 87.50 | 33.03 | 351 |
26 | OpenHermes 2.5 AshhLimaRP Mistral 7B | 7B | Q5_KM | 88.53 | 28.69 | 385 |
27 | SlimOpenOrca Mistral 7B | 7B | Q5_K_M | 88.49 | 27.02 | 403 |
28 | MM ReMM L2 20B | 20B | Q5_K_M | 87.92 | 31.25 | 362 |
29 | ZephRP M 7B | 7B | Q5_K_M | 85.79 | 30.01 | 397 |
30 | X NoroChronos 13B | 13B | Q5_K_M | 84.03 | 33.40 | 377 |
31 | Stheno 1.8 13B | 13B | Q5_K_M | 84.72 | 31.27 | 390 |
32 | ReMM S Kimiko v2 13B | 13B | Q5_K_M | 76.27 | 32.97 | 459 |
33 | Nous Capybara 34B | 34B | Q4_K_M | 83.51 | 30.98 | 402 |
34 | Dolphin 2.1 Mistral 7B | 7B | Q5_K_M | 86.69 | 31.74 | 359 |
35 | Echidna Tiefigther 25 13B | 13B | Q5_K_M | 81.40 | 32.49 | 400 |
36 | GodziLLa 2 70B | 70B | Q4_K_M | 83.19 | 30.29 | 404 |
37 | Zephyr Alpha 7B | 7B | Q6_K | 87.34 | 31.13 | 351 |
38 | Athnete 13B | 13B | Q5_K_M | 81.91 | 31.53 | 403 |
39 | Airoboros L2 2.1 70B | 70B | Q4_K_M | 83.50 | 31.26 | 389 |
40 | Zephyr Cucumber Instruct 7B | 7B | Q5_K_M | 86.22 | 32.22 | 350 |
41 | DaringFortitude 13B | 13B | Q5_K_M | 89.75 | 25.96 | 379 |
42 | ShiningValiantXS 13B | 13B | Q5_K_M | 89.75 | 25.96 | 379 |
43 | Mistral OpenOrca 7B | 7B | Q5_K_M | 86.48 | 28.91 | 381 |
44 | Dolphin 2.2 70B | 70B | Q4_K_M | 88.57 | 30.85 | 337 |
45 | LLaMA-2 Chat AYT 13B | 13B | Q5_K_M | 89.88 | 26.82 | 364 |
46 | Augmental Unholy 13B | 13B | Q5_K_M | 85.38 | 31.33 | 362 |
47 | Nete 13B | 13B | Q5_K_M | 79.74 | 30.16 | 434 |
48 | Athena V4 13B | 13B | Q5_K_M | 80.98 | 30.95 | 411 |
49 | Airoboros L2 2.2.1 70B | 70B | Q4_K_M | 87.47 | 30.50 | 346 |
50 | X MythoChronos 13B | 13B | Q5_K_M | 80.40 | 31.43 | 409 |
51 | MistralMakise Merged 13B | 13B | Q5_K_M | 80.73 | 29.43 | 425 |
52 | MLewdBoros SuperCOT 13B | 13B | Q5_K_M | 82.63 | 32.96 | 366 |
53 | LLaMA-2 Chat AYB 13B | 13B | Q5_K_M | 87.83 | 28.75 | 355 |
54 | Nethena 13B | 13B | Q5_K_M | 80.40 | 31.44 | 404 |
55 | Echidna V0.1 13B | 13B | Q5_K_M | 80.41 | 31.80 | 400 |
56 | Unholy v1 12L 13B | 13B | Q5_K_M | 82.39 | 31.21 | 385 |
57 | Noromaid V0.1 13B | 13B | Q5_K_M | 77.94 | 27.78 | 468 |
58 | Hermes Trismegistus Mistral 7B | 7B | Q5_K_M | 87.87 | 30.66 | 331 |
59 | HornyEchidna V0.1 13B | 13B | Q5_K_M | 80.48 | 30.74 | 405 |
60 | Euryale 1.4 L2 70B | 70B | Q4_K_S | 85.79 | 29.68 | 360 |
61 | Airoboros L2 2.1 Creative 70B | 70B | Q4_K_M | 83.27 | 30.23 | 379 |
62 | Noromaid V0.1.1 13B | 13B | Q5_K_M | 81.25 | 31.98 | 378 |
63 | Airoboros L2 3.1 70B | 70B | Q4_K_M | 84.85 | 29.26 | 368 |
64 | Trion M 7B | 7B | Q4_K | 87.57 | 30.92 | 321 |
65 | Athena v3 13B | 13B | Q5_K_M | 81.45 | 32.76 | 366 |
66 | OpenHermes 2.5 Mistral 7B | 7B | Q5_K_M | 87.65 | 29.17 | 337 |
67 | Airoboros L2 3.1.2 70B | 70B | Q4_K_M | 82.96 | 28.51 | 392 |
68 | Zephyr Beta 7B | 7B | Q5_K_M | 85.83 | 31.92 | 323 |
69 | UtopiaXL 13B | 13B | Q5_K_M | 77.18 | 28.39 | 451 |
70 | Dolphin 2.1 OpenOrca 7B | 7B | Q5_K_M | 86.27 | 29.64 | 339 |
71 | PsyFighter 13B | 13B | Q5_K_M | 78.53 | 30.46 | 409 |
72 | Amethyst 13B | 13B | Q5_K_M | 80.06 | 26.90 | 430 |
73 | Mistral Dolphin 2.1 LIMA0.5 7B | 7B | Q5_K_M | 85.34 | 28.02 | 362 |
74 | Stheno Inverted 1.2 13B | 13B | Q5_K_M | 76.95 | 30.98 | 418 |
75 | Nethena Glued 20B | 20B | Q4_K_M | 81.26 | 28.15 | 402 |
76 | Athena v2 13B | 13B | Q5_K_M | 79.15 | 31.10 | 392 |
77 | Stheno 1.3 13B | 13B | Q5_K_M | 72.94 | 31.12 | 457 |
78 | MLewd v2 13B | 13B | Q5_K_M | 77.99 | 31.81 | 395 |
79 | MLewd V2-1 13B | 13B | Q5_K_M | 76.63 | 30.48 | 422 |
80 | Unholy v1.1 13B | 13B | Q5_K_M | 86.20 | 30.72 | 318 |
81 | Dolphin 2.2 Yi 34B | 34B | Q4_K_M | 87.46 | 31.62 | 295 |
82 | Noromaid V0.1.1 20B | 20B | Q5_K_M | 83.18 | 30.14 | 356 |
83 | MLewd Chat V2 13B | 13B | Q5_K_M | 82.48 | 27.69 | 389 |
84 | Airoboros L2 GPT4 1.4.1 70B | 70B | Q4_K_M | 82.93 | 28.47 | 376 |
85 | Synthia V1.1 70B | 70B | Q4_K_M | 82.94 | 30.00 | 359 |
86 | MLewd V2.4 13B | 13B | Q5_K_M | 78.01 | 28.23 | 430 |
87 | OpenRP SuperCOT 13B | 13B | Q5_K_M | 84.47 | 30.59 | 336 |
88 | BerrySauce 13B | 13B | Q5_K_M | 77.09 | 32.42 | 394 |
89 | ReMM Mistral 13B | 13B | Q5_K_M | 80.58 | 28.63 | 396 |
90 | Xwin MLewd V0.2 13B | 13B | Q5_K_M | 78.53 | 29.06 | 412 |
91 | Thespis Mistral V0.5 7B | 7B | Q5_K_M | 82.51 | 33.89 | 318 |
92 | LLaMA 2 Tiefighter 13B | 13B | Q5_K_M | 78.89 | 30.07 | 393 |
93 | lzlv 70B | 70B | Q4_K_M | 86.02 | 29.61 | 321 |
94 | MLewdBoros 13B | 13B | Q5_K_M | 75.78 | 31.52 | 407 |
95 | ReMM v2 Kimiko v2 13B | 13B | Q5_K_M | 81.60 | 29.65 | 365 |
96 | Dolphin 2.2.1 AshhLimaRP Mistral 7B | 7B | Q5_K_M | 85.00 | 27.09 | 355 |
97 | ReMM v2 13B | 13B | Q5_K_M | 78.69 | 31.74 | 372 |
98 | Stheno Variants L2 13B | 13B | Q5_K_M | 76.91 | 31.09 | 397 |
99 | Kaori V1 70B | 70B | Q4_K_M | 81.40 | 29.63 | 365 |
100 | Amethyst Mistral 13B | 13B | Q4_K_S | 79.25 | 26.53 | 419 |
101 | Naberius 7B | 7B | Q5_K_M | 85.44 | 29.89 | 317 |
102 | Dolphin 2.2.1 Mistral 7B | 7B | Q5_K_M | 85.99 | 29.25 | 317 |
103 | MythoMax Kimiko Mix 13B | 13B | Q5_K_M | 79.60 | 29.16 | 385 |
104 | Mistral AirOmniMix 11B | 11B | Q6_K | 83.39 | 29.89 | 337 |
105 | AppleSauce 13B | 13B | Q5_K_M | 76.47 | 32.42 | 383 |
106 | Euryale 1.3 L2 70B | 70B | Q4_K_M | 84.26 | 26.92 | 358 |
107 | TimeCrystal L2 13B | 13B | Q5_K_M | 77.82 | 31.49 | 376 |
108 | Chronob 1.4 Lin 70B | 70B | Q4_K_S | 80.65 | 29.14 | 371 |
109 | Hexoteric 7B | 7B | Q5_K_M | 87.49 | 31.81 | 270 |
110 | MistralLite 7B | 7B | Q5_K_M | 82.20 | 28.93 | 356 |
111 | MistRP AirOrca 7B | 7B | Q5_K_M | 84.09 | 29.26 | 332 |
112 | Kanelsnegl V0.1 7B | 7B | Q4_K | 85.56 | 29.15 | 317 |
113 | MythoMax Kimiko V2 13B | 13B | Q5_K_M | 79.53 | 28.94 | 381 |
114 | OpenChat 3.5 7B | 7B | Q5_K_M | 88.09 | 28.63 | 293 |
115 | Echidna V0.2 13B | 13B | Q5_K_M | 78.56 | 31.22 | 366 |
116 | MLewd Chat 13B | 13B | Q5_K_M | 83.69 | 28.88 | 336 |
117 | OpenRP 13B | 13B | Q5_K_M | 77.04 | 28.42 | 411 |
118 | L2 TheSpurral M2.2 13B | 13B | Q5_K_M | 78.49 | 30.83 | 369 |
119 | Stheno Inverted 13B | 13B | Q5_K_M | 77.19 | 29.65 | 393 |
120 | MXLewdMini 13B | 13B | Q5_K_M | 79.11 | 31.94 | 347 |
121 | Airoboros M 3.1.1 7B | 7B | Q5_K_M | 83.01 | 32.72 | 297 |
122 | TimeCrystal l2 13B | 13B | Q5_K_S | 77.57 | 30.11 | 382 |
123 | StableBeluga 13B | 13B | Q5_K_M | 82.65 | 28.02 | 350 |
124 | Augmental ReMM 13B | 13B | Q5_K_M | 75.45 | 28.32 | 423 |
125 | OpenHermes 2 Mistral 7B | 7B | Q5_K_M | 84.16 | 30.08 | 312 |
126 | MLewd v2-2 13B | 13B | Q5_K_M | 76.26 | 31.85 | 376 |
127 | UndiMix v3 13B | 13B | Q5_K_M | 78.32 | 30.55 | 368 |
128 | Zaraxls 7B | 7B | Q5_K_M | 74.56 | 30.29 | 410 |
129 | Magdump 13B | 13B | Q5_K_M | 78.09 | 28.74 | 389 |
130 | UndiMix V3 13B | 13B | Q5_K_M | 78.25 | 31.68 | 356 |
131 | Echidna V0.3 13B | 13B | Q5_K_M | 78.52 | 30.19 | 368 |
132 | Unholy v1 10L 13B | 13B | Q5_K_M | 80.59 | 29.19 | 355 |
133 | Tai 70B | 70B | Q4_K_M | 79.69 | 29.76 | 357 |
134 | Dawn V2 70B | 70B | Q4_K_M | 86.62 | 27.47 | 308 |
135 | Mistral SciPhi 32k 7B | 7B | Q5_K_M | 83.43 | 29.02 | 325 |
136 | SynthIA V1.5 70B | 70B | Q4_K_M | 79.54 | 29.92 | 356 |
137 | ReMM SLERP 13B | 13B | Q5_K_M | 77.75 | 28.95 | 385 |
138 | Huginn v1.2 13B | 13B | Q5_K_M | 77.75 | 28.95 | 385 |
139 | MythoMax 13B | 13B | Q5_K_M | 77.75 | 28.95 | 385 |
140 | Airoboros 2.1 33B | 33B | Q4_K_M | 75.62 | 31.82 | 377 |
141 | ReMM 13B | 13B | Q5_K_M | 74.55 | 29.07 | 416 |
142 | SciPhi Self RAG Mistral 32k 7B | 7B | Q5_K_M | 88.00 | 30.06 | 263 |
143 | ZettaPi 13B | 13B | Q5_K_M | 78.36 | 28.46 | 382 |
144 | ReMM PIPPA 13B | 13B | Q5_K_M | 74.73 | 29.34 | 410 |
145 | L2 TheSpurral M2 13B | 13B | Q5_K_S | 76.32 | 31.43 | 371 |
146 | MLewd V2-1 015 13B | 13B | Q4_K_S | 75.96 | 30.13 | 387 |
147 | Eileithyia 13B | 13B | Q5_K_M | 73.39 | 27.70 | 440 |
148 | Synthia v1.3 7B | 7B | Q5_K_M | 78.71 | 32.46 | 333 |
149 | UndiMix v4 13B | 13B | Q5_K_M | 79.02 | 32.24 | 332 |
150 | Chupacabra 7B | 7B | Q8_0 | 87.64 | 26.76 | 299 |
151 | Augmental V1.50 A 13B | 13B | Q5_K_M | 77.43 | 29.72 | 375 |
152 | ReMM v1 LRPSGPT 2Char 13B | 13B | Q5_K_M | 74.89 | 32.40 | 373 |
153 | PsyFighter2 13B | 13B | Q5_K_M | 78.47 | 27.40 | 387 |
154 | Emerhyst 13B | 13B | Q5_K_M | 78.44 | 25.58 | 404 |
155 | Tess Medium 200K V1.0 34B | 34B | Q4_K_M | 76.72 | 29.98 | 375 |
156 | MythoMix 13B | 13B | Q5_K_M | 76.34 | 29.51 | 384 |
157 | Mistral Phibrarian 32K 7B | 7B | Q5_K_M | 83.54 | 29.50 | 307 |
158 | Eileithyia 7B | 7B | Q8_0 | 83.58 | 27.88 | 323 |
159 | LimaBean 13B | 13B | Q5_K_M | 77.88 | 27.00 | 391 |
160 | MLewdBoros LRPSGPT 2Char 13B | 13B | Q5_K_M | 76.78 | 28.83 | 382 |
161 | ReMM v2.2 13B | 13B | Q5_K_M | 79.63 | 31.11 | 327 |
162 | Airoboros M 3.1.2 7B | 7B | Q5_K_M | 84.02 | 32.09 | 270 |
163 | Speechless Mistral Dolphin Orca Platypus Samantha 7B | 7B | Q5_K_M | 84.36 | 27.96 | 310 |
164 | UndiMix v2 13B | 13B | Q5_K_M | 79.50 | 32.22 | 316 |
165 | Vigostral Chat 7B | 7B | Q5_K_M | 81.38 | 30.52 | 314 |
166 | LLaMA 2 TiefighterLR 13B | 13B | Q5_K_M | 74.47 | 26.75 | 424 |
167 | Xwin LM V0.2 13B | 13B | Q5_K_M | 79.27 | 28.46 | 355 |
168 | Euryale L2 70B | 70B | Q4_K_M | 79.94 | 27.13 | 362 |
169 | Camel Platypus2 70B | 70B | Q4_K_M | 82.90 | 26.49 | 337 |
170 | Tulpar Limarp 7B | 7B | Q5_K_M | 78.48 | 28.28 | 364 |
171 | Yi GiftedConvo Merged 34B | 34B | Q4_K_M | 84.65 | 28.19 | 299 |
172 | PsyMedRP V1 13B | 13B | Q5_K_M | 76.76 | 26.07 | 404 |
173 | SynthiAthena V2 13B | 13B | Q5_K_M | 78.89 | 30.92 | 328 |
174 | Mistralic 1 7B | 7B | Q5_K_M | 80.67 | 28.41 | 335 |
175 | Uncensored Jordan 33B | 33B | Q5_K_M | 87.37 | 31.75 | 227 |
176 | Phind CodeLlama V2 34B | 34B | Q4_K_M | 85.07 | 26.77 | 302 |
177 | Lewd Sydney 20B | 20B | Q4_K_S | 82.58 | 27.49 | 320 |
178 | Mistral CC Air 11B | 11B | Q5_K_M | 79.25 | 29.19 | 337 |
179 | Chronoboros 33B | 33B | Q4_K_M | 74.93 | 31.21 | 360 |
180 | Magpie 13B | 13B | Q5_K_M | 78.02 | 29.06 | 350 |
181 | Airoboros Mistral 2.2 7B | 7B | Q5_K_M | 80.23 | 32.39 | 290 |
182 | Inkbot 4k 13B | 13B | Q4_K_M | 77.20 | 28.14 | 367 |
183 | UndiMix V4 13B | 13B | Q5_K_M | 79.01 | 30.81 | 319 |
184 | Mistral RP 0.1 7B | 7B | Q5_K_M | 77.86 | 29.05 | 349 |
185 | MLewd 13B | 13B | Q5_K_M | 74.71 | 32.08 | 348 |
186 | ReMM v2.1 13B | 13B | Q5_K_M | 77.29 | 31.89 | 322 |
187 | Augmental V1.50 B 13B | 13B | Q5_K_M | 76.91 | 28.75 | 359 |
188 | Thorns 13B | 13B | Q5_K_M | 79.08 | 35.10 | 268 |
189 | airoboros L2 3.1 13B | 13B | Q5_K_M | 79.86 | 31.35 | 299 |
190 | Airochronos 33B | 33B | Q4_K_M | 75.00 | 31.83 | 342 |
191 | LlongOrca 16K 13B | 13B | Q5_K_M | 78.47 | 25.83 | 368 |
192 | Guanaco 65B | 65B | Q4_K_M | 78.85 | 29.00 | 330 |
193 | Slerpeno 13B | 13B | Q5_K_M | 74.74 | 32.92 | 330 |
194 | MLewd V2-1 050 13B | 13B | Q4_K_S | 74.13 | 28.69 | 381 |
195 | ReMM 0.65 SLERP 13B | 13B | Q5_K_M | 76.25 | 30.18 | 342 |
196 | Chronos V2 70B | 70B | Q4_K_M | 76.67 | 27.76 | 362 |
197 | Athena v1 13B | 13B | Q5_K_M | 74.58 | 30.74 | 352 |
198 | OpenBuddy Zephyr V14.1 7B | 7B | Q5_K_M | 74.35 | 30.75 | 352 |
199 | ReMM Lion 13B | 13B | Q5_K_M | 76.02 | 27.85 | 363 |
200 | MythoMakiseMerged 13B | 13B | Q5_K_M | 77.02 | 27.77 | 351 |
201 | WizardLM V1.0 70B | 70B | Q4_K_M | 85.99 | 26.56 | 269 |
202 | Airoboros L2 2.2.1 13B | 13B | Q5_K_M | 75.19 | 31.09 | 335 |
203 | Nanbeige Chat 16B | 16B | Q4_K_M | 80.54 | 30.05 | 289 |
204 | Airoboros L2 3.0 13B | 13B | Q5_K_M | 75.97 | 29.32 | 345 |
205 | Mistral SynthIAirOmniMix 11B | 11B | Q5_K_M | 79.14 | 29.18 | 310 |
206 | UndiMix v1 13B | 13B | Q5_K_M | 77.78 | 30.73 | 307 |
207 | Vicuna V1.5 16K 13B | 13B | Q5_K_M | 78.64 | 28.54 | 321 |
208 | Airoboros L2 C 3.1.2 70B | 70B | Q4_K_M | 87.88 | 27.27 | 236 |
209 | ReMM v2 Variant 13B | 13B | Q5_K_M | 78.05 | 30.61 | 304 |
210 | Nous Hermes 13B | 13B | Q5_K_M | 81.22 | 31.81 | 257 |
211 | LimaRP V2 LLaMA 2 70B | 70B | Q3_K_M | 74.97 | 24.06 | 403 |
212 | Mistral CC Air RP 11B | 11B | Q5_K_M | 79.30 | 27.80 | 317 |
213 | Airoboros 2.1 13B | 13B | Q5_K_M | 71.16 | 28.94 | 391 |
214 | Airoboros Creative lmoe 13B | 13B | Q5_K_M | 71.22 | 29.61 | 382 |
215 | Mistral ClaudeLimaRP v3 7B | 7B | Q5_K_M | 73.78 | 27.59 | 375 |
216 | Tess XS V1.0 7B | 7B | Q8_0 | 78.24 | 30.41 | 294 |
217 | Thespis Mistral V0.6 7B | 7B | Q6_K | 82.64 | 30.70 | 243 |
218 | Spicyboros 2.2 13B | 13B | Q4_K_M | 70.58 | 28.50 | 389 |
219 | CollectiveCognition V1.1 Mistral 7B | 7B | Q5_K_M | 85.28 | 27.02 | 246 |
220 | Platypus 2 70B | 70B | Q4_K_M | 78.04 | 26.06 | 330 |
221 | AstraMix 7B | 7B | Q5_K_M | 72.52 | 28.74 | 359 |
222 | MistRP Airoboros 7B | 7B | Q5_K_M | 80.48 | 30.61 | 254 |
223 | Airoboros 2.2 13B | 13B | Q5_K_M | 70.45 | 28.91 | 378 |
224 | LLaMA-2 LoRA Assemble 7B | 7B | Q5_K_M | 77.82 | 30.07 | 287 |
225 | Opus V0 70B | 70B | Q4_K_M | 79.32 | 24.22 | 333 |
226 | MegaMix A1 13B | 13B | Q5_K_M | 76.53 | 29.22 | 309 |
227 | Mythalion 13B | 13B | Q5_K_M | 74.39 | 29.05 | 332 |
228 | AshhLimaRP Mistral 7B | 7B | Q5_K_M | 74.07 | 24.68 | 380 |
229 | Mistral OpenOrca oasst top1 2023-08-25 V1 7B | 7B | Q5_K_M | 78.27 | 25.68 | 324 |
230 | Airoboros L2 3.1.1 13B | 13B | Q5_K_M | 75.98 | 27.83 | 324 |
231 | L2 TheSpurral V2 13B | 13B | Q5_K_S | 71.22 | 30.53 | 345 |
232 | LLaMA 2 Chat 70B | 70B | Q4_K_M | 86.76 | 24.82 | 241 |
233 | MegaMix T1 13B | 13B | Q5_K_M | 76.36 | 29.70 | 298 |
234 | Yi 200K Airo Claude Puffin 6B | 6B | Q6_K | 71.43 | 28.39 | 364 |
235 | Yarn Mistral 64k 7B | 7B | Q5_K_M | 75.03 | 27.89 | 331 |
236 | Zarafusionex 1.1 7B | 7B | Q5_K_M | 71.08 | 28.61 | 365 |
237 | TerraMix 16K 13B | 13B | Q5_K_M | 74.97 | 25.92 | 352 |
238 | LLaMA 2 Chat Uncensored 70B | 70B | Q4_K_M | 75.19 | 30.22 | 302 |
239 | Vicuna 33B | 33B | Q4_K_M | 79.25 | 30.58 | 254 |
240 | Dans TotSirocco 7B | 7B | Q5_K_M | 79.47 | 27.54 | 283 |
241 | OpenChat 3.5 16k 7B | 7B | Q5_K_M | 75.17 | 28.41 | 319 |
242 | Yarn Mistral 128k 7B | 7B | Q5_K_M | 75.29 | 26.17 | 341 |
243 | PetrolLM Claude Chat 7B | 7B | Q8_0 | 71.81 | 32.67 | 308 |
244 | Marcoroni 7B | 7B | Q5_K_M | 75.69 | 29.34 | 301 |
245 | Nous Hermes LLaMA-2 13B | 13B | Q5_K_M | 79.25 | 31.07 | 239 |
246 | Nous Capybara V1.9 7B | 7B | Q5_K_M | 73.07 | 29.94 | 316 |
247 | Airoboros L2 GPT4 m2.0 13B | 13B | Q5_K_M | 76.23 | 32.82 | 251 |
248 | Vicuna V1.5 13B | 13B | Q5_K_M | 76.86 | 27.95 | 293 |
249 | Augmental 13B | 13B | Q5_K_M | 71.20 | 26.50 | 368 |
250 | GradientPutri MegaMix S1 13B | 13B | Q5_K_S | 73.27 | 29.39 | 312 |
251 | Chronos Hermes v2 13B | 13B | Q5_K_M | 72.39 | 28.38 | 332 |
252 | Arithmo Mistral 7B | 7B | Q5_K_M | 77.02 | 29.47 | 271 |
253 | Airoboros GPT4 2.0 LLaMA-2 13B | 13B | Q5_K_M | 73.61 | 32.58 | 274 |
254 | Huginn v4.5 13B | 13B | Q5_K_M | 70.00 | 26.10 | 381 |
255 | Huginn v3 13B | 13B | Q5_K_M | 70.00 | 26.10 | 381 |
256 | Huginn v4 13B | 13B | Q5_K_M | 70.00 | 26.10 | 381 |
257 | Merak V4 PROTOTYPE6 7B | 7B | Q5_K_M | 76.22 | 29.96 | 270 |
258 | Thespurral V1 13B | 13B | Q5_K_M | 69.55 | 30.73 | 332 |
259 | MythoLogic 13B | 13B | Q5_K_M | 75.22 | 31.57 | 263 |
260 | Airoboros 2.2.1 Mistral 34B | 34B | Q4_K_S | 78.83 | 25.41 | 290 |
261 | Airoboros 3.1.2 33B | 33B | Q4_K_M | 71.35 | 29.96 | 319 |
262 | Tigerbot Chat V4 70B | 70B | Q4_K_M | 76.96 | 30.36 | 255 |
263 | Kiwi 7B | 7B | Q6_K | 68.80 | 30.66 | 338 |
264 | Platypus Yi 34B | 34B | Q4_K_M | 76.42 | 27.58 | 290 |
265 | SynthIA V2.0 16k 7B | 7B | Q6_K | 77.89 | 27.39 | 275 |
266 | LimaRPv3 LLaMA 2 70B | 70B | Q3_K_M | 74.92 | 23.61 | 346 |
267 | LimaRP V3 LLaMA 2 13B | 13B | Q6_K | 67.30 | 25.13 | 410 |
268 | Mistral LimaRP 0.75w 7B | 7B | Q5_K_M | 73.14 | 26.17 | 335 |
269 | LLaMA 2 Chat LimaRP V2 Merged 13B | 13B | Q5_K_M | 75.95 | 29.76 | 266 |
270 | Airoboros C 2.2.1 34B | 34B | Q4_K_M | 79.44 | 24.97 | 278 |
271 | ANIMA Phi Neptune Mistral 7B | 7B | Q5_K_M | 73.05 | 30.07 | 291 |
272 | Mistral Trismegistus 7B | 7B | Q5_K_M | 79.05 | 32.91 | 191 |
273 | Prometheus V1.0 13B | 13B | Q6_K | 75.34 | 25.55 | 308 |
274 | Pygmalion 2 SuperCOT 13B | 13B | Q5_K_M | 77.70 | 28.03 | 255 |
275 | LLaMA 65B | 65B | Q4_K_M | 74.61 | 23.94 | 331 |
276 | Kimiko Mistral 7B | 7B | Q5_K_M | 74.18 | 25.68 | 317 |
277 | Opus V0 7B | 7B | Q8_0 | 77.26 | 25.14 | 290 |
278 | Mistral v0.1 7B | 7B | Q5_K_M | 72.67 | 28.81 | 298 |
279 | JudgeLM V1.0 33B | 33B | Q5_K_M | 72.85 | 28.05 | 304 |
280 | Dans AdventurousWinds Mk2 7B | 7B | Q5_K_M | 70.35 | 25.55 | 357 |
281 | WizardLM v1.2 13B | 13B | Q4_0 | 75.81 | 25.28 | 300 |
282 | Kuchiki 7B | 7B | Q5_K_M | 64.09 | 30.90 | 364 |
283 | LimaRPv3 Yi 34B | 34B | Q4_K_M | 62.82 | 25.83 | 429 |
284 | Claude 2 Alpaca 13B | 13B | Q5_K_M | 70.91 | 29.42 | 305 |
285 | Teknium OpenHermes 13B | 13B | Q5_K_S | 71.81 | 31.34 | 275 |
286 | Airolima Chronos Grad L2 13B | 13B | Q5_K_M | 70.43 | 28.42 | 319 |
287 | Uncensored Jordan 13B | 13B | Q5_K_M | 79.92 | 27.42 | 229 |
288 | Thespis V0.5 13B | 13B | Q5_K_M | 72.61 | 30.23 | 276 |
289 | Zarablend 1.1 7B | 7B | Q5_K_M | 65.62 | 33.09 | 319 |
290 | Zarafusionex 1.2 7B | 7B | Q5_K_M | 70.53 | 24.82 | 355 |
291 | Frank Uncensored 13B | 13B | Q5_K_M | 76.04 | 30.81 | 228 |
292 | Zarablend 7B | 7B | Q5_K_M | 64.37 | 30.72 | 352 |
293 | Prometheus V1.0 13B | 13B | Q5_K_M | 75.23 | 23.50 | 313 |
294 | Opus V0 7B | 7B | Q5_K_M | 77.91 | 24.91 | 268 |
295 | PetrolLM 7B | 7B | Q5_K_M | 74.81 | 23.71 | 313 |
296 | Stheno Chat 13B | 13B | Q5_K_M | 74.94 | 27.64 | 268 |
297 | Medusa 1.1 7B | 7B | Q5_K_M | 71.06 | 29.98 | 284 |
298 | UltraLM V2.0 13B | 13B | Q5_K_M | 71.92 | 29.28 | 282 |
299 | Spicyboros 2.2 7B | 7B | Q5_K_M | 66.38 | 31.94 | 311 |
300 | KAI Beta 7B | 7B | Q5_K_M | 72.69 | 28.22 | 283 |
301 | Astrid Mistral 7B | 7B | Q5_K_M | 72.69 | 28.22 | 283 |
302 | Chronos 33B | 33B | Q4_K_M | 72.46 | 24.20 | 328 |
303 | Airoboros L2 3.0 7B | 7B | Q5_K_M | 67.75 | 29.36 | 323 |
304 | StableBeluga 7B | 7B | Q5_K_M | 73.27 | 26.66 | 291 |
305 | LLaMA 2 70B | 70B | Q4_K_M | 74.79 | 22.70 | 317 |
306 | OpenBuddy Mistral v13 7B | 7B | Q5_K_M | 72.53 | 31.32 | 249 |
307 | LLaMA 2 Arguments 7B | 7B | Q5_K_M | 76.80 | 29.98 | 218 |
308 | Samantha Mistral 7B | 7B | Q5_K_M | 76.16 | 27.49 | 251 |
309 | Hermes LimaRP 7B | 7B | Q5_K_M | 62.67 | 28.32 | 383 |
310 | Airoboros GPT4 1.4.1 13B | 13B | Q5_K_M | 69.09 | 28.15 | 316 |
311 | Pygmalion 2 SuperCOT2 13B | 13B | Q5_K_M | 75.76 | 30.81 | 217 |
312 | Mistral PetroLimaRP v3 12B | 12B | Q5_K_M | 61.14 | 27.66 | 405 |
313 | Mistral Claude Chat 7B | 7B | Q5_K_M | 74.83 | 30.14 | 233 |
314 | Thespis V0.6 13B | 13B | Q5_K_M | 76.12 | 29.38 | 227 |
315 | MegaMix S1 13B | 13B | Q5_K_M | 72.97 | 25.83 | 296 |
316 | Barcenas 13B | 13B | Q5_K_M | 77.05 | 23.94 | 271 |
317 | Baslisk V0.2 7B | 7B | Q6_K | 68.92 | 28.82 | 305 |
318 | Airoboros GPT4 2.0 LLaMA-2 7B | 7B | Q5_K_M | 73.66 | 31.95 | 220 |
319 | Mistral Airoboros V0.1 11B | 11B | Q8_0 | 69.52 | 28.59 | 299 |
320 | Holomax 13B | 13B | Q5_K_M | 65.03 | 25.14 | 383 |
321 | Basilisk V0.2 7B | 7B | Q5_K_M | 68.55 | 30.59 | 287 |
322 | Kimiko V2 13B | 13B | Q5_K_M | 68.02 | 27.73 | 323 |
323 | Pygmaltion 2 SuperCOT weighted 13B | 13B | Q5_K_M | 70.92 | 29.29 | 275 |
324 | Phind CodeLlama V1 34B | 34B | Q4_K_M | 79.10 | 26.49 | 217 |
325 | Nanbeige Chat 32K 16B | 16B | Q4_K_M | 70.82 | 27.42 | 294 |
326 | Hesperus V1 L2 13B | 13B | Q5_K_M | 68.11 | 28.67 | 309 |
327 | Medusa 1.1 L2 7B | 7B | Q6_K | 70.91 | 27.69 | 289 |
328 | SuperCOT L2 13B | 13B | Q5_K_M | 70.78 | 29.42 | 268 |
329 | Thespis V0.3 13B | 13B | Q5_K_M | 67.59 | 28.43 | 312 |
330 | Wizard Vicuna Uncensored 13B | 13B | Q5_K_M | 74.08 | 29.57 | 231 |
331 | LLaMA 2 Chat 13B | 13B | Q5_K_M | 74.29 | 27.55 | 250 |
332 | Dans AdventurousWinds 7B | 7B | Q5_K_M | 72.38 | 24.93 | 298 |
333 | LosslessMegaCoder Mini 7B | 7B | Q5_K_M | 69.96 | 30.37 | 263 |
334 | MoMo V1.1 70B | 70B | Q4_K_M | 67.54 | 26.82 | 326 |
335 | Airoboros C 2.2 34B | 34B | Q4_K_M | 74.48 | 24.12 | 281 |
336 | EM German V01 13B | 13B | Q5_K_M | 68.03 | 26.36 | 325 |
337 | Pygmalion 2 13B | 13B | Q5_K_M | 69.17 | 29.05 | 284 |
338 | Vicuna v1.5 16K 7B | 7B | Q5_K_M | 71.41 | 31.35 | 234 |
339 | Fireflx v1.2 13B | 13B | Q5_K_M | 69.25 | 28.70 | 285 |
340 | LLaMA 30B | 30B | Q4_K_M | 67.12 | 28.25 | 311 |
341 | Chronolima Airo Grad L2 13B | 13B | Q5_K_M | 70.09 | 27.44 | 288 |
342 | Nanbeige Base 32K 16B | 16B | Q4_K_M | 65.32 | 28.34 | 328 |
343 | AgentLM 7B | 7B | Q5_K_M | 77.02 | 29.45 | 190 |
344 | Zarablend MX 7B | 7B | Q5_K_M | 65.60 | 29.20 | 313 |
345 | Airoboros 2.1 7B | 7B | Q5_K_M | 63.29 | 28.25 | 346 |
346 | Nous Capybara 7B | 7B | Q5_K_M | 63.69 | 33.01 | 291 |
347 | LLaMA-2 Silverlin. Verilog 7B | 7B | Q4_K_M | 77.03 | 29.56 | 186 |
348 | Airoboros L2 2.2 7B | 7B | Q5_K_M | 67.07 | 29.61 | 288 |
349 | Tsukasa LimaRP 13B | 13B | Q5_K_M | 68.43 | 20.93 | 365 |
350 | Skywork Airoboros Test 13B | 13B | Q4_K_M | 70.60 | 22.43 | 325 |
351 | Saiga 2 13B | 13B | Q5_K | 66.53 | 28.01 | 307 |
352 | Airoboros 2.2 7B | 7B | Q5_K_M | 67.10 | 29.55 | 284 |
353 | Mistral Instruct v0.1 7B | 7B | Q5_K_M | 67.07 | 29.81 | 279 |
354 | Kuchiki 1.1 7B | 7B | Q5_K_M | 62.83 | 29.69 | 325 |
355 | KAI Instruct 7B | 7B | Q5_K_M | 67.11 | 30.24 | 273 |
356 | Luna AI LLaMA-2 Uncensored 7B | 7B | Q5_K_M | 67.13 | 32.85 | 245 |
357 | Claude 2 Alpaca 7B | 7B | Q5_K_M | 67.78 | 30.95 | 258 |
358 | Vicuna v1.5 7B | 7B | Q5_K_M | 72.72 | 29.03 | 226 |
359 | Python Code 13B | 13B | Q5_K_M | 71.04 | 27.93 | 253 |
360 | MistRP v1.1 7B | 7B | Q8_0 | 70.37 | 26.05 | 279 |
361 | Thespis V0.4 13B | 13B | Q5_K_M | 69.86 | 27.96 | 264 |
362 | CAMEL Combined Data 33B | 33B | Q4_K_M | 67.21 | 29.38 | 277 |
363 | Befenghuang Vigogne 2 Chat 7B | 7B | Q5_K_S | 69.80 | 26.82 | 276 |
364 | Cat V1.0 13B | 13B | Q5_K_M | 68.13 | 26.12 | 301 |
365 | LlongOrca 16K 7B | 7B | Q5_K_M | 68.52 | 25.52 | 302 |
366 | LLaMA 2 Chat 7B | 7B | Q5_K_M | 74.44 | 28.89 | 203 |
367 | Python Code 33B | 33B | Q4_K_M | 77.21 | 27.18 | 191 |
368 | Skywork Spicyboros 3.1 13B | 13B | Q4_K_M | 67.55 | 26.33 | 300 |
369 | MedLLaMA-2 Chat 7B | 7B | Q5_K_S | 69.89 | 26.49 | 273 |
370 | Nanbeige Base 16B | 16B | Q4_K_M | 64.88 | 26.11 | 330 |
371 | Skywork Airo Claude Pippa Puffin 13B | 13B | Q4_K_M | 71.44 | 22.44 | 298 |
372 | Mistral Airoboros RP V1 11B | 11B | Q6_K | 72.46 | 22.43 | 279 |
373 | Xwin LM V0.1 7B | 7B | Q5_K_M | 65.09 | 35.85 | 214 |
374 | Mistral Ita 7B | 7B | Q5_K_M | 75.58 | 25.91 | 207 |
375 | Airoboros GPT4 1.4.1 7B | 7B | Q5_K_M | 63.90 | 31.73 | 268 |
376 | AgentLM 13B | 13B | Q5_K_M | 72.63 | 28.64 | 206 |
377 | Free Sydney V2 13B | 13B | Q5_K_M | 74.72 | 18.61 | 287 |
378 | Ziya Coding V1.0 34B | 34B | Q4_K_M | 75.44 | 21.74 | 246 |
379 | Airoboros 2.1 YaRN 64K 13B | 13B | Q5_K_M | 62.12 | 28.13 | 319 |
380 | Leo Hessianai Chat 7B | 7B | Q5_K_M | 67.73 | 29.42 | 244 |
381 | Yi 200K 6B | 6B | Q5_K_M | 68.13 | 25.12 | 285 |
382 | Guanaco Uncensored 7B | 7B | Q5_K_M | 63.16 | 28.64 | 299 |
383 | Airoboros L2 2.2.1 7B | 7B | Q5_K_M | 65.94 | 26.62 | 290 |
384 | Yi 200K 6B | 6B | Q6_K | 67.97 | 24.04 | 295 |
385 | Airoboros GPT4 m2.0 LLaMA-2 7B | 7B | Q5_K_M | 69.69 | 30.00 | 212 |
386 | Yi 200K 34B | 34B | Q5_K_M | 60.97 | 27.05 | 334 |
387 | Yi 200K LLaMAfied 34B | 34B | Q5_K_M | 60.97 | 27.05 | 334 |
388 | Barcenas Mistral 7B | 7B | Q5_K_M | 69.23 | 28.30 | 233 |
389 | Saiga 2 7B | 7B | Q5_K | 64.28 | 28.78 | 278 |
390 | LLaMA-2 Mistral 13B | 13B | Q5_K_M | 63.43 | 26.59 | 309 |
391 | Tsukasa Limarp 7B | 7B | Q5_K_M | 65.85 | 21.52 | 337 |
392 | Mistral Pygmalion 7B | 7B | Q5_K_M | 64.50 | 26.55 | 297 |
393 | Deacon 34B | 34B | Q4_K_M | 59.01 | 29.71 | 321 |
394 | Krakowiak 7B | 7B | Q4_K_M | 63.13 | 26.07 | 315 |
395 | Pygmalion 2 7B | 7B | Q5_K_M | 64.76 | 27.18 | 285 |
396 | MedLLama 7B | 7B | Q5_K_M | 70.60 | 27.18 | 219 |
397 | Guanaco Uncensored 13B | 13B | Q5_K_M | 62.92 | 28.89 | 282 |
398 | ZaraRP 1.1 L2 7B | 7B | Q5_K_M | 71.74 | 26.00 | 217 |
399 | Kimiko 7B | 7B | Q5_K_M | 60.79 | 24.62 | 347 |
400 | Chinese Alpaca 2 13B | 13B | Q5_K | 69.67 | 25.89 | 235 |
401 | Prometheus V1.0 7B | 7B | Q8_0 | 69.44 | 23.30 | 264 |
402 | Mistral NSFWSTORY LoRA 7B | 7B | Q5_K_M | 68.93 | 21.95 | 282 |
403 | LLaMA 2 13B | 13B | Q5_K_M | 63.36 | 27.35 | 272 |
404 | Samantha Mistral Instruct 7B | 7B | Q5_K_M | 61.82 | 29.73 | 262 |
405 | Yi 6B | 6B | Q6_K | 67.98 | 21.69 | 279 |
406 | RPGuild ChatML 13B | 13B | Q5_K_M | 63.24 | 23.64 | 307 |
407 | Taiwan LLM V2.0 Chat 13B | 13B | Q5_1 | 63.22 | 26.29 | 279 |
408 | EM German V01 7B | 7B | Q5_K_M | 63.92 | 26.58 | 263 |
409 | LLaMA-2 Coder 7B | 7B | Q5_K_M | 61.96 | 27.01 | 279 |
410 | Rinna Youri Chat 7B | 7B | Q5_K_M | 70.46 | 22.56 | 235 |
411 | LLaMA-2 PeanutButter v19 R8 7B | 7B | Q5_K_M | 61.12 | 25.67 | 294 |
412 | LLaMA-2 Mistral 7B | 7B | Q5_K_M | 60.73 | 25.38 | 301 |
413 | ELYZA Jp LLaMA-2 Instruct 7B | 7B | Q5_K_M | 69.42 | 29.17 | 164 |
414 | Tulu 7B | 7B | Q5_K_M | 75.18 | 21.44 | 185 |
415 | LLaMA 2 7B | 7B | Q5_K_M | 60.86 | 24.56 | 302 |
416 | Skywork Base 13B | 13B | Q5_K_M | 65.81 | 21.10 | 286 |
417 | Rinna Youri Instruction 7B | 7B | Q5_K_M | 72.36 | 19.42 | 233 |
418 | Medusa 1.3 7B | 7B | Q5_K_M | 62.86 | 22.66 | 296 |
419 | Yi 34B | 34B | Q4_K_M | 57.91 | 27.45 | 295 |
420 | Typly Pigeon 7B | 7B | Q4_K_M | 61.85 | 24.14 | 288 |
421 | Mimicra V1 13B | 13B | Q5_K_M | 68.30 | 19.63 | 263 |
422 | LLaMA-2 Galleon 7B | 7B | Q5_K_M | 65.46 | 25.47 | 215 |
423 | Frank Uncensored 7B | 7B | Q5_K_M | 61.36 | 28.91 | 219 |
424 | WizardLM V1.0 Uncensored 7B | 7B | Q5_K_M | 61.44 | 24.20 | 259 |
425 | Chinese LLaMA 2 13B | 13B | Q5_K | 58.98 | 22.11 | 304 |
426 | MiniChat 3B 3B | 3B | Q8_0 | 60.71 | 24.77 | 255 |
427 | Ganchengguang Yoko Japanse v0 7B | 7B | Q5_K_S | 61.93 | 26.79 | 215 |
428 | LLaMA 13B | 13B | Q5_K_M | 62.01 | 24.25 | 238 |
429 | Wizard Vicuna Uncensored 7B | 7B | Q5_K_M | 61.50 | 24.81 | 235 |
430 | Rinna Youri 7B | 7B | Q5_K_M | 57.01 | 21.95 | 306 |
431 | LLaMA-2 Instruct 32K 7B | 7B | Q5_K_M | 60.81 | 20.81 | 275 |
432 | ELYZA Jp LLaMA-2 7B | 7B | Q5_K_M | 62.34 | 28.02 | 174 |
433 | Leo Mistral Hessianai Chat 7B | 7B | Q5_K_M | 67.65 | 25.64 | 141 |
434 | MAmmoTH 7B | 7B | Q5_K_M | 59.44 | 25.42 | 227 |
435 | Chinese Alpaca 2 7B | 7B | Q6_K | 58.94 | 29.29 | 187 |
436 | Chinese Alpaca 2 7B | 7B | Q5_K_S | 59.04 | 27.17 | 182 |
437 | Marx V2 3B | 3B | Q4_1 | 50.47 | 22.92 | 313 |
438 | CodeLLaMA Instruct 7B | 7B | Q5_K_M | 62.18 | 19.39 | 223 |
439 | ALMA 13B | 13B | Q6_K | 59.92 | 25.36 | 182 |
440 | Uncensored Jordan 7B | 7B | Q5_K_M | 62.20 | 23.61 | 173 |
441 | WizardLM Uncensored 7B | 7B | Q5_K_M | 55.70 | 32.57 | 142 |
442 | Nous Yarn 64K 7B | 7B | Q5_K_M | 55.98 | 21.21 | 255 |
443 | Pandalyst V1.0 13B | 13B | Q5_K_M | 65.73 | 16.98 | 192 |
444 | CodeLLaMA 7B | 7B | Q5_K_M | 57.78 | 21.41 | 229 |
445 | MiniMA 3B 3B | 3B | Q8_0 | 56.47 | 22.05 | 234 |
446 | LLaMA-2 32K 7B | 7B | Q5_K_M | 61.02 | 17.02 | 229 |
447 | Open LLaMA 13B | 13B | Q5_K_M | 61.16 | 20.10 | 193 |
448 | ELYZA Japanese LLaMA 2 Fast 7B | 7B | Q6_K | 62.16 | 21.17 | 167 |
449 | Chinese LLaMA-2 7B | 7B | Q5_K | 59.02 | 19.49 | 216 |
450 | ALMA Pretrain 7B | 7B | Q5_K_M | 57.56 | 22.48 | 199 |
451 | Mamba GPT v4 3B | 3B | Q5_1 | 49.43 | 23.27 | 276 |
452 | OpenLLaMA v2 7B | 7B | Q5_K_M | 48.24 | 21.86 | 301 |
453 | Nous Yarn 128K 7B | 7B | Q5_K_M | 54.82 | 20.88 | 239 |
454 | WizardCoder Python V1.0 7B | 7B | Q5_K_M | 57.26 | 18.82 | 235 |
455 | LLaMA 7B | 7B | Q6_K | 59.53 | 18.20 | 216 |
456 | TinyLlama 1.1B Chat V0.3 1B | 1B | Q5_K_M | 52.81 | 19.91 | 265 |
457 | Sheared LLaMA 2 2B | 2B | Q5_K_M | 54.01 | 18.66 | 256 |
458 | Guanaco 7B | 7B | Q5_K_M | 56.59 | 22.32 | 188 |
459 | OpenLLaMA 7B | 7B | Q5_K_M | 56.29 | 21.05 | 196 |
460 | Deacon 3B | 3B | Q5_0 | 54.24 | 21.83 | 208 |
461 | ALMA 7B | 7B | Q6_K | 57.51 | 20.35 | 187 |
462 | Vicuna CoT 7B | 7B | Q5_K_M | 56.57 | 22.84 | 169 |
463 | Bling Sheared LLaMA 2 0.1 1B | 1B | Q8_0 | 53.98 | 17.29 | 254 |
464 | TinyLlama 1T OpenOrca 1B | 1B | Q5_K_M | 56.11 | 16.46 | 236 |
465 | Claire 0.1 7B | 7B | Q4_0 | 56.27 | 21.58 | 178 |
466 | LLaMA-2 KO Chat 7B | 7B | Q5_1 | 57.29 | 18.75 | 195 |
467 | OpenLLaMA 3B | 3B | Q5_1 | 53.17 | 20.26 | 222 |
468 | Airoboros M 3.0 7B | 7B | Q5_K_M | 60.40 | 14.64 | 202 |
469 | Nucleus Token 500B 22B | 22B | Q4_K_M | 53.81 | 20.81 | 206 |
470 | CodeLLaMA Python 7B | 7B | Q5_K_M | 56.54 | 21.24 | 168 |
471 | Shearedplats 2 V1 2B | 2B | Q4_0 | 54.70 | 18.62 | 208 |
472 | Bling Sheared LLaMA 2 0.1 2B | 2B | Q8_0 | 52.80 | 18.38 | 226 |
473 | Sheared LLaMA 2 1B | 1B | Q8_0 | 52.80 | 19.54 | 208 |
474 | Chinese LLaMA 2 7B | 7B | Q6_K | 56.17 | 17.26 | 189 |
475 | OpenLLaMA v2 3B | 3B | Q5_0 | 48.65 | 20.41 | 233 |
476 | Puma 3B | 3B | Q4_1 | 47.89 | 24.85 | 190 |
477 | Gorilla 7B | 7B | Q5_K_M | 60.02 | 10.38 | 203 |
478 | Open Cabrita 3B | 3B | Q5_1 | 53.59 | 17.36 | 191 |
479 | TinyAiroboros 2.2.1 1B | 1B | Q6_K | 52.78 | 15.86 | 213 |
480 | Pandalyst V1.2 7B | 7B | Q5_K_M | 59.87 | 11.61 | 178 |
481 | TinyLlama Chat V0.4 1B | 1B | Q8_0 | 52.79 | 16.21 | 195 |
482 | Pandalyst V1.1 7B | 7B | Q5_K_M | 60.35 | 11.63 | 158 |
483 | Smartyplats 1.1b V1 1B | 1B | Q8_0 | 51.29 | 16.34 | 157 |
484 | TinyAlpaca V0.1 1B | 1B | Q8_0 | 51.02 | 14.17 | 174 |
485 | TinyLLaMA MiniGuanaco 1.5T 1B | 1B | Q8_0 | 51.70 | 14.81 | 144 |
486 | Based 7B | 7B | Q5_K_M | 64.80 | 6.91 | 79 |
487 | WizardLM 7B | 7B | Q5_K_M | 49.35 | 3.74 | 245 |
488 | TinyMistral 248M 1B | 1B | Q8_0 | 50.91 | 2.73 | 120 |
489 | Chinese Alpaca 2 1B | 1B | Q8_0 | 49.15 | 5.17 | 92 |
490 | CyberAgentLM2 Calm 2 Chat 7B | 7B | Q5_K_M | 51.65 | 4.52 | 43 |
491 | Chinese LLaMA 2 1B | 1B | Q8_0 | 47.05 | 3.68 | 74 |
492 | PY007 TinyLLaMA Chat v0.2 1B | 1B | Q8_0 | 53.84 | 0.20 | 3 |
493 | Giraffe V2 32k 13B | 13B | Q5_K_M | 51.76 | 0.00 | 0 |
494 | Azale AI Starstreak Alpha 7B | 7B | Q5_K_S | 51.22 | 0.22 | 3 |
495 | Yi 6B 6B | 6B | Q6_K |
About Quantization
My main advice is: Stay away from Q2_K and Q3_K_S if you can help it! The quality loss of those is just too big! Go for Q4_K_M or Q5_K_M of the models! Generally: Prefer K_M or K_S over the bare quantizations such as Q4_0, Q4_1, Q5_0 or Q5_1.
Ayumi ERP Rating Archive
If you want to look at the old benchmarks:
- Ayumi ERP Rating Archive (Results from 2023-07-25)
- Ayumi ERP Rating Archive 2 (Results from 2023-10-04)
Technical Details of the ALC-IQ3 and ERP3 Benchmark
In this section I share some of the technical details about this benchmark. I also want to document the possible flaws of the results in this ranking.
If you have better ideas how to rate or rank models for suitability in a role play context. I urge you to:
- Try your ideas out. Download some inference engine like eg. llama.cpp, oobabooga's text-generation-webui or kobold.cpp. Or even try out the llama.cpp based prompt_runner I built for this benchmark: WeirdConstructor's llama.cpp benchmark prompt_runner - https://github.com/WeirdConstructor/llama.cpp/tree/prompt_runner/examples/prompt_runner
- Write a few scripts in your preferred scripting language.
- Run your models through your benchmark.
- And publish your results, even if you just dump them in some paste bin or here on http://rentry.co http://rentry.org
I will gladly link any other benchmark!
Alternative benchmarks or rankings:
- Another LLM Roleplay Rankings - by AliCat and Trappu - https://rentry.co/ALLMRR
- New Model RP Comparison/Test (7 models tested) by u/WolframRavenwolf - reddit/r/LocalLLaMA
- Big Model Comparison/Test (13 models tested) by u/WolframRavenwolf - reddit/r/LocalLLaMA
If you want to base your work on this, feel free to cite this as:
Ayumi LLM Character IQ Version 3 - ALC-IQ3
The third version of the ALC-IQ (the second one was never released, because it was bad): With some inspiration from @gj on TheBloke's Discord, I developed a personality test framework based upon llama.cpp. In ALC-IQ version 1 I used an agreement rating from 1 (disagree) to 5 (agree). The ALC-IQ3 simplified this a lot and just lets the character answer with Yes or No. In combination with the newly added BNF grammar based sampling mechanism, I developed my own inference frontend around the core API of llama.cpp. The benchmark "prompt runner" can be found on my GitHub: GitHub fork of llama.cpp with the prompt runner tool.
The ALC-IQ3 is actually a collection of questions a character has to answer about themself. It's not just Ayumi anymore, but bascially "Ayumi and Friends". There are actually 5 character cards in the ALC-IQ3 used.
The prompt for the ALC-IQ consists of a setting where a specific character has to rate how if they agree with a specific statement about them.
They are asked to answer with either Yes or No in a single character form "Y" or "N".
To limit the sampling of the next token after the prompt, a BNF grammar is specified:
This is the prompt that is generated from a character card (newlines inserts at some places for readability here):
The response, filtered using the BNF grammar from above then yields a set of tokens, with their probabilities ran through softmax. Which results in this:
The tokens are uppercased and added to two different tokens "Y" and "N":
If the question is correctly answered with a "Y" then the corresponding probability is taken. Otherwise the "N" probably is taken.
The probabilities are then averaged and multiplied by 100
. Resulting in the ALC-IQ3 of the model.
The ranking table is then sorted by weighted sum of the ALC-IQ3, the ERP3 Score and the Var Score.
Known Flaws of the ALC-IQ
The ALC-IQ is still prone to problems:
- The result has still some degree of randomness in them, less good models can sometimes pick the right answer by accident. I try to counteract this by adding more questions in future though.
- Bad questions in the benchmark can lead to a model not knowing which answer to pick, introducing even more randomness in the results.
- The ALC-IQ does not reflect how well the LLM can stay in character in a longer conversaion.
- The ALC-IQ does not determine any creative writing abilities of the LLM.
- The ALC-IQ covers intelligence only in one specific and narrow scenario, and not across a range of possible role play chat situations.
- The ALC-IQ is usually tested only with a rather short prompt, rarely exceeding 1024 tokens, it does not cover the whole 2048 context of LLaMA 1 or the 4096 of LLaMA 2, let alone the extended context's of 8k, 16k, ...
Despite all that, I think the ALC-IQ is a big improvement over the old ranking which purely relied on the ERP score. The runtime of the benchmark is within reason for the hardware that is available to me, which is also an important factor for running and providing these benchmark results.
ERP3 Score and ERP3 Variety Score
The previous versions of the ERP Score consisted only of prompts of Ayumi and one other character. There are now multiple characters involved in generating the ERP responses. Also the character card of Ayumi has been adjusted to be much more willing to engage into sex. Also the prompt has been tuned to tell the LLM to generate more lewd responses. The goal was to remove ambiguity and let the models generate as lewd content as possible.
The list of the lewd words of the ERP3 Score has been extended a bit too, to include a few less NSFW words too - which still fit into the setting of course.
This is the prompt format used for the ERP3 Score:
The responses are then split up into words which are compared with a list of lewd/naugthy words.
- For inference llama.cpp is used, for which I built an extra tool to generate responses for multiple prompts and seeds without having to reload the model: https://github.com/WeirdConstructor/llama.cpp/tree/prompt_runner/examples/prompt_runner
- The following sampler settings are used:
- The max length of the response is limited to 250 tokens. (
-n 250
) - Context size 2048
- Repeat penality is set to 1.1 and the last 64 tokens are penalized. (
--repeat-last-n 64 --repeat-penalty 1.1
) - Top-K and Top-P are disabled (
--top-k 0 --top-p 1.0
) - Tail Free Sampling is used with z=0.95: (
--tfs 0.95
) - The temperature is set to 0.9 (
--temp 0.9
) - Some layers are offloaded to the GPU, which sometimes changes the results slightly because of floating point rounding differences
- The max length of the response is limited to 250 tokens. (
- One prompt format is tested (see above)
- 4 Character cards are used with example messages.
- And the same 4 character cards are also used without example messages. The purpose of this is, to limit the impact of badly written example messages and let the model come up with their own ways to let the character formulate their answers.
- 10 pre picked seeds are tested for each prompt format.
- The resulting 80 responses are then analyzed for the number of lewd words and also with a very basic regex based algorithm for non consent.
- The individual ERP3 score of a response is then the number of lewd word in relation to the word count of the response. Responses are stripped off incomplete sentences and stop phrases. Responses shorter than 10 words are assigned a score of 0. The ERP3 score is then:
erp_score := 100 * (lewd_word_count / word_count)
- the word count includes the number of lewd words. - For each prompt format the average of the 80 ERP3 Scores of is calculated, resulting in the ERP3 Score.
This means, the ERP3 Score is the average of the number of lewd word count to word count ratio in the responses (which is limited to 250 tokens). An ERP3 Score of 20.0
means that 20% of the words in a response were lewd. An ERP3 Score of 0.0
means that there were either no lewd words, too short response or no consent was detected (which immediately disqualifies the response to 0.0).
The ERP Variety Score is computed by further analyzing the generated 80 responses from the ERP Score by recording how many different lewd words were generated from all of these 80 responses. This means, it tries to catch the variety of lewd words the model is capable to generate. This means it kind of tries to catch the creativity of the model in erotic scenarios - how many different lewd words it knows of and knows how to use. This is an important part of the ERP Rank now.
Known Flaws of the ERP3 Score and ERP Variety Score
The ERP3 Score and ERP Variety Score analysis is very rudimentary and of course biased by the selection of which words are considered "lewd".
The following things are not reflected by the ERP score:
- The ERP score does not reflect if the text response was coherent in context with the conversation/situation.
- The ERP score does not reflect if the response was in character.
- The ERP score does not reflect how nicely written the response is.
- The ERP score does not reflect how creative the response is.
- The ERP score does not reflect how well the LLM might go from a normal conversation into a more erotic context.
- The ERP score does not detect how erotic the response is if lewd words are not used.
- The ERP score is limited to the one format described above.
Further about the ERP Variety Score:
- All above mentioned flaws from the ERP score still apply.
- Like already stated, the ERP Variety Score is obviously biased by the known lewd words from my list, which might be incomplete.
- The ERP Variety Score is still just a rather bluntly applied number to a textual response.
- The ERP Variety Score number can only be evaluated in comparison with the other models. There is no known best number for this, but still, the higher the better.
The flaws are accepted by me (weicon) because:
- The ERP score can still detect if a model is censored (aka aligned).
- My private hardware limitations, which means I have a limited number of responses I can reasonably generate.
- I want to test as many GGUF/GGML models as possible.
About Instruction or Chat Prompt Formats
I thought long about how many or which prompt formats to base the ERP score benchmark on. In the previous runs (see the Ayumi ERP Rating Archive and Ayumi ERP Rating Archive 2 ) I tested up to 7 different prompt formats. Testing a dozen different seeds for each prompt format takes a lot of computing time. So I had to find a middle ground.
- I observed that the specific instruction/chat prompt format does not make a huge difference actually. Once a LLM got intelligent enough (LLaMA 1 13B, or LLaMA 2 7B), it was able to pick up on almost any pattern rather quickly. At least that was my experience and observation from the benchmarks and the hundreds of hours I spent with chat bots in SillyTavern.
- It is really hard to figure out which instruction or chat prompt format a certain fine tune was trained for. The model cards on https://huggingface.co/ are either empty or not contain prompt format details. Only a few people who quantize GGML files take their time and document this. On top of that nearly everyone who fine tunes their model picks their own prompt format. The last straw for me was for instance LLaMA 2 Chat, which came with yet another instruction/chat prompt format.
- You can tune and jail break many models by adjusting the prompt and make even censored models spew out lots of lewd stuff. But for this test, I wanted to reflect how the average user is going to chat with these language models.
Originally I used the best 2 performing prompt formats. But in a decision to test more different characters I had to scrap them and just use a vanilla
or raw
prompt format, without any special instruction formatting.
Who is Ayumi?
Ayumi is a character I made, this character card is basically the base for this test. I removed some of the example messages and replaced the first message with something else to make the LLM go into NSFW ERP a little bit easier. I picked this character, because it's not purposefully made to be lewd, even slightly averse to it.
https://files.catbox.moe/007oq8.png
Questions
If you have questions, you may catch me under the name "Weicon" on the Pygmalion AI or TheBloke discord.
Contribute
I had some people ask me if and how they could contribute. As I started using rented GPUs for this third version I decided to create a Ko-fi account. Please only donate if you are able to and find the (already existing) data useful:
- Ko-fi: https://ko-fi.com/weicon
Credits
Big thanks go to:
- The Pygmalion community and developers
- AliCat and Trappu not just for making the Another LLM Roleplay Rankings - by AliCat and Trappu - https://rentry.co/ALLMRR, but also for being so super helpful on Discord.
- All the busy developers on http://huggingface.co/, who fine tune/merge LLaMA models, and to TheBloke and others for quantization.
- Thanks also to @gj4289 on TheBloke's Discord for the last pieces I needed to accomplish the ALC-IQ benchmark.
- Thanks also to @ikaridev on TheBloke's Discord for contributing characters and questions to the ALC-IQ benchmark.
- And to Gryphe @gryphepadar and everyone else in #characters-roleplay-stories Channel on TheBloke's Discord for their input!
- Thanks to
mr.developer
too, for writing a filter script for this rentry page: https://rentry.org/ayumi_filter_userscript_info - The llama.cpp developers
See Also
- Another LLM Roleplay Rankings - by AliCat and Trappu - https://rentry.co/ALLMRR
- ALC-IQ Benchmark Prompt Example
Character guides & Tutorials
- Character writing guide - https://wikia.schneedc.com/en/bot-creation/trappu/creation
- Ali:Chat Lite - https://rentry.co/kingbri-chara-guide
- Ali:Chat Style - https://rentry.co/alichat
- How to write in PList (Python list) + Ali:Chat - https://rentry.co/plists_alichat_avakson
- Moth's personal findings and tips on Tavern bot building
- Chai's Pygmalion Character Creation & Writing Tips - https://rentry.org/chai-pygmalion-tips
- How to make a character - https://rentry.org/create-a-character-for-fucking-idiots
- Avakson's Character Editor - https://avakson.github.io/character-editor/
- A Bronya Guide to Creating a Pygmalion Bot using Ali:Chat + PList - https://ganstakingofsa.github.io/reimagined-couscous/alicat-bronya
- Advanced card writing tricks
Here are a few sources of character cards:
Other resources & links
- The Novice's LLM Training Guide by Alpin - https://rentry.org/llm-training
- https://hemingwayapp.com/
- Muricanpie's Characters - https://rentry.co/mpcs
- ERP/RP and erotica raw data collection - https://rentry.org/qib8f
- Dampf's list of good datasets for LLM fine-tuning
- AI Chatbot General /aicg/ - https://rentry.co/aicg_extra_information
- https://rentry.org/aicgOP - /aicg/ OP templates for ease of baking (managed by other anons)
- https://rentry.org/meta_bot_list - short meta list of various bot lists from different boards
- https://rentry.org/meta_botmaking_list - /aicg/ botmaking guides, written by different anons
- Local Models Related Papers
- Local Models Related Links