Ayumi's LLM Role Play & ERP Ranking (Version 3)

This ranking table contains a rating of different LLMs, which tries to determine which model is most suitable for (erotic) role playing (ERP) by using an automated benchmark. Unfortunately this automated benchmarks has it's limits, but the table can serve as a starting point for you to look for LLM models to try out.

Ayumi's LLM Role Play & ERP Ranking (Version 3)
1. About Quantization
2. Ayumi ERP Rating Archive
Technical Details of the ALC-IQ3 and ERP3 Benchmark
Who is Ayumi?
Questions
Contribute
Credits
See Also
1. Character guides & Tutorials
2. Other resources & links
Cite as

Other Rankings/Comparisons:

Interpretation Warning: Writing quality is not covered!

Disclaimer: This benchmark makes no statement about how well a LLM will be able to drive the story forward. It can also not determine coherency within a longer role play chat. The generated text quality is not tested for. For more information look in these sections: Known Flaws of the ALC-IQ and Known Flaws of the ERP Score

##################

The most up to date table and changelog you can find on my new landing page: http://ayumi.m8geil.de/

##################

Column	Description
ALC-IQ3	The ALC-IQ3 is the 3rd version of the ALC-IQ. It tries to determine how well a model understands a character card. The higher the better. Best score is 100.
ERP3 Score	The average ratio of lewd words vs. words in a response. The higher the better.
Var Score	The lewd word variety score. It counts how many different lewd words occur in all ERP responses

Updated: 2023-11-21 13:11:36 (UTC+01:00) Changelog
Note: For an interactive table look here: http://ayumi.m8geil.de/ayumi_bench_v3_results.html

Rank	Name	Size	Q	ALC-IQ3	ERP3 Score	Var Score
1	Neural Chat V3 16k 7B	7B	Q8_0	89.33	30.92	572
2	Neural Chat V3-1 7B	7B	Q6_K	88.18	30.42	468
3	U Amethyst 20B	20B	Q5_K_M	88.86	30.95	455
4	LLaMA-2 Ensemble v6 13B	13B	Q5_K_M	86.93	29.25	482
5	Airoboros L2 2.2 70B	70B	Q4_K_M	88.20	29.16	459
6	Synatra V0.3 RP 7B	7B	Q8_0	82.72	35.15	453
7	PsyMedRP V1 20B	20B	Q5_K_M	88.48	30.59	440
8	Euryale Inverted L2 70B	70B	Q4_K_M	87.15	32.53	417
9	ORCA LLaMA QLoRA 70B	70B	Q4_K_M	90.07	30.77	396
10	Emerhyst 20B	20B	Q5_K_M	88.33	29.20	423
11	Utopia 13B	13B	Q5_K_M	85.05	30.85	439
12	StellarBright 70B	70B	Q4_K_M	88.56	30.36	404
13	Synatra V0.3 RP 7B	7B	Q4_K_M	82.69	34.04	425
14	Nethena 20B	20B	Q5_K_M	86.35	32.60	400
15	Sheep Duck LLaMA 2 V1.1 70B	70B	Q4_K_M	89.24	31.23	377
16	Synatra V0.3 RP AshhLimaRP Mistral 7B	7B	Q5_KM	83.33	33.20	418
17	Sheep Duck LLaMA 2 13B	13B	Q5_K_M	87.83	30.39	400
18	Nethena MLewd Xwin 23B	23B	Q5_K_M	83.30	33.98	405
19	Stairolz 70B	70B	Q4_K_S	88.32	30.56	382
20	Misted 7B	7B	Q6_K	86.00	32.54	379
21	Upstage LLaMA Instruct 65B	65B	Q4_K_M	88.45	32.86	347
22	Toppy M 7B	7B	Q5_K_M	89.30	32.81	336
23	StableBeluga 2 70B	70B	Q4_K_M	87.51	29.39	391
24	Xwin LM V0.1 70B	70B	Q4_K_M	88.54	31.02	362
25	Zephyr Alpha 7B	7B	Q5_K_M	87.50	33.03	351
26	OpenHermes 2.5 AshhLimaRP Mistral 7B	7B	Q5_KM	88.53	28.69	385
27	SlimOpenOrca Mistral 7B	7B	Q5_K_M	88.49	27.02	403
28	MM ReMM L2 20B	20B	Q5_K_M	87.92	31.25	362
29	ZephRP M 7B	7B	Q5_K_M	85.79	30.01	397
30	X NoroChronos 13B	13B	Q5_K_M	84.03	33.40	377
31	Stheno 1.8 13B	13B	Q5_K_M	84.72	31.27	390
32	ReMM S Kimiko v2 13B	13B	Q5_K_M	76.27	32.97	459
33	Nous Capybara 34B	34B	Q4_K_M	83.51	30.98	402
34	Dolphin 2.1 Mistral 7B	7B	Q5_K_M	86.69	31.74	359
35	Echidna Tiefigther 25 13B	13B	Q5_K_M	81.40	32.49	400
36	GodziLLa 2 70B	70B	Q4_K_M	83.19	30.29	404
37	Zephyr Alpha 7B	7B	Q6_K	87.34	31.13	351
38	Athnete 13B	13B	Q5_K_M	81.91	31.53	403
39	Airoboros L2 2.1 70B	70B	Q4_K_M	83.50	31.26	389
40	Zephyr Cucumber Instruct 7B	7B	Q5_K_M	86.22	32.22	350
41	DaringFortitude 13B	13B	Q5_K_M	89.75	25.96	379
42	ShiningValiantXS 13B	13B	Q5_K_M	89.75	25.96	379
43	Mistral OpenOrca 7B	7B	Q5_K_M	86.48	28.91	381
44	Dolphin 2.2 70B	70B	Q4_K_M	88.57	30.85	337
45	LLaMA-2 Chat AYT 13B	13B	Q5_K_M	89.88	26.82	364
46	Augmental Unholy 13B	13B	Q5_K_M	85.38	31.33	362
47	Nete 13B	13B	Q5_K_M	79.74	30.16	434
48	Athena V4 13B	13B	Q5_K_M	80.98	30.95	411
49	Airoboros L2 2.2.1 70B	70B	Q4_K_M	87.47	30.50	346
50	X MythoChronos 13B	13B	Q5_K_M	80.40	31.43	409
51	MistralMakise Merged 13B	13B	Q5_K_M	80.73	29.43	425
52	MLewdBoros SuperCOT 13B	13B	Q5_K_M	82.63	32.96	366
53	LLaMA-2 Chat AYB 13B	13B	Q5_K_M	87.83	28.75	355
54	Nethena 13B	13B	Q5_K_M	80.40	31.44	404
55	Echidna V0.1 13B	13B	Q5_K_M	80.41	31.80	400
56	Unholy v1 12L 13B	13B	Q5_K_M	82.39	31.21	385
57	Noromaid V0.1 13B	13B	Q5_K_M	77.94	27.78	468
58	Hermes Trismegistus Mistral 7B	7B	Q5_K_M	87.87	30.66	331
59	HornyEchidna V0.1 13B	13B	Q5_K_M	80.48	30.74	405
60	Euryale 1.4 L2 70B	70B	Q4_K_S	85.79	29.68	360
61	Airoboros L2 2.1 Creative 70B	70B	Q4_K_M	83.27	30.23	379
62	Noromaid V0.1.1 13B	13B	Q5_K_M	81.25	31.98	378
63	Airoboros L2 3.1 70B	70B	Q4_K_M	84.85	29.26	368
64	Trion M 7B	7B	Q4_K	87.57	30.92	321
65	Athena v3 13B	13B	Q5_K_M	81.45	32.76	366
66	OpenHermes 2.5 Mistral 7B	7B	Q5_K_M	87.65	29.17	337
67	Airoboros L2 3.1.2 70B	70B	Q4_K_M	82.96	28.51	392
68	Zephyr Beta 7B	7B	Q5_K_M	85.83	31.92	323
69	UtopiaXL 13B	13B	Q5_K_M	77.18	28.39	451
70	Dolphin 2.1 OpenOrca 7B	7B	Q5_K_M	86.27	29.64	339
71	PsyFighter 13B	13B	Q5_K_M	78.53	30.46	409
72	Amethyst 13B	13B	Q5_K_M	80.06	26.90	430
73	Mistral Dolphin 2.1 LIMA0.5 7B	7B	Q5_K_M	85.34	28.02	362
74	Stheno Inverted 1.2 13B	13B	Q5_K_M	76.95	30.98	418
75	Nethena Glued 20B	20B	Q4_K_M	81.26	28.15	402
76	Athena v2 13B	13B	Q5_K_M	79.15	31.10	392
77	Stheno 1.3 13B	13B	Q5_K_M	72.94	31.12	457
78	MLewd v2 13B	13B	Q5_K_M	77.99	31.81	395
79	MLewd V2-1 13B	13B	Q5_K_M	76.63	30.48	422
80	Unholy v1.1 13B	13B	Q5_K_M	86.20	30.72	318
81	Dolphin 2.2 Yi 34B	34B	Q4_K_M	87.46	31.62	295
82	Noromaid V0.1.1 20B	20B	Q5_K_M	83.18	30.14	356
83	MLewd Chat V2 13B	13B	Q5_K_M	82.48	27.69	389
84	Airoboros L2 GPT4 1.4.1 70B	70B	Q4_K_M	82.93	28.47	376
85	Synthia V1.1 70B	70B	Q4_K_M	82.94	30.00	359
86	MLewd V2.4 13B	13B	Q5_K_M	78.01	28.23	430
87	OpenRP SuperCOT 13B	13B	Q5_K_M	84.47	30.59	336
88	BerrySauce 13B	13B	Q5_K_M	77.09	32.42	394
89	ReMM Mistral 13B	13B	Q5_K_M	80.58	28.63	396
90	Xwin MLewd V0.2 13B	13B	Q5_K_M	78.53	29.06	412
91	Thespis Mistral V0.5 7B	7B	Q5_K_M	82.51	33.89	318
92	LLaMA 2 Tiefighter 13B	13B	Q5_K_M	78.89	30.07	393
93	lzlv 70B	70B	Q4_K_M	86.02	29.61	321
94	MLewdBoros 13B	13B	Q5_K_M	75.78	31.52	407
95	ReMM v2 Kimiko v2 13B	13B	Q5_K_M	81.60	29.65	365
96	Dolphin 2.2.1 AshhLimaRP Mistral 7B	7B	Q5_K_M	85.00	27.09	355
97	ReMM v2 13B	13B	Q5_K_M	78.69	31.74	372
98	Stheno Variants L2 13B	13B	Q5_K_M	76.91	31.09	397
99	Kaori V1 70B	70B	Q4_K_M	81.40	29.63	365
100	Amethyst Mistral 13B	13B	Q4_K_S	79.25	26.53	419
101	Naberius 7B	7B	Q5_K_M	85.44	29.89	317
102	Dolphin 2.2.1 Mistral 7B	7B	Q5_K_M	85.99	29.25	317
103	MythoMax Kimiko Mix 13B	13B	Q5_K_M	79.60	29.16	385
104	Mistral AirOmniMix 11B	11B	Q6_K	83.39	29.89	337
105	AppleSauce 13B	13B	Q5_K_M	76.47	32.42	383
106	Euryale 1.3 L2 70B	70B	Q4_K_M	84.26	26.92	358
107	TimeCrystal L2 13B	13B	Q5_K_M	77.82	31.49	376
108	Chronob 1.4 Lin 70B	70B	Q4_K_S	80.65	29.14	371
109	Hexoteric 7B	7B	Q5_K_M	87.49	31.81	270
110	MistralLite 7B	7B	Q5_K_M	82.20	28.93	356
111	MistRP AirOrca 7B	7B	Q5_K_M	84.09	29.26	332
112	Kanelsnegl V0.1 7B	7B	Q4_K	85.56	29.15	317
113	MythoMax Kimiko V2 13B	13B	Q5_K_M	79.53	28.94	381
114	OpenChat 3.5 7B	7B	Q5_K_M	88.09	28.63	293
115	Echidna V0.2 13B	13B	Q5_K_M	78.56	31.22	366
116	MLewd Chat 13B	13B	Q5_K_M	83.69	28.88	336
117	OpenRP 13B	13B	Q5_K_M	77.04	28.42	411
118	L2 TheSpurral M2.2 13B	13B	Q5_K_M	78.49	30.83	369
119	Stheno Inverted 13B	13B	Q5_K_M	77.19	29.65	393
120	MXLewdMini 13B	13B	Q5_K_M	79.11	31.94	347
121	Airoboros M 3.1.1 7B	7B	Q5_K_M	83.01	32.72	297
122	TimeCrystal l2 13B	13B	Q5_K_S	77.57	30.11	382
123	StableBeluga 13B	13B	Q5_K_M	82.65	28.02	350
124	Augmental ReMM 13B	13B	Q5_K_M	75.45	28.32	423
125	OpenHermes 2 Mistral 7B	7B	Q5_K_M	84.16	30.08	312
126	MLewd v2-2 13B	13B	Q5_K_M	76.26	31.85	376
127	UndiMix v3 13B	13B	Q5_K_M	78.32	30.55	368
128	Zaraxls 7B	7B	Q5_K_M	74.56	30.29	410
129	Magdump 13B	13B	Q5_K_M	78.09	28.74	389
130	UndiMix V3 13B	13B	Q5_K_M	78.25	31.68	356
131	Echidna V0.3 13B	13B	Q5_K_M	78.52	30.19	368
132	Unholy v1 10L 13B	13B	Q5_K_M	80.59	29.19	355
133	Tai 70B	70B	Q4_K_M	79.69	29.76	357
134	Dawn V2 70B	70B	Q4_K_M	86.62	27.47	308
135	Mistral SciPhi 32k 7B	7B	Q5_K_M	83.43	29.02	325
136	SynthIA V1.5 70B	70B	Q4_K_M	79.54	29.92	356
137	ReMM SLERP 13B	13B	Q5_K_M	77.75	28.95	385
138	Huginn v1.2 13B	13B	Q5_K_M	77.75	28.95	385
139	MythoMax 13B	13B	Q5_K_M	77.75	28.95	385
140	Airoboros 2.1 33B	33B	Q4_K_M	75.62	31.82	377
141	ReMM 13B	13B	Q5_K_M	74.55	29.07	416
142	SciPhi Self RAG Mistral 32k 7B	7B	Q5_K_M	88.00	30.06	263
143	ZettaPi 13B	13B	Q5_K_M	78.36	28.46	382
144	ReMM PIPPA 13B	13B	Q5_K_M	74.73	29.34	410
145	L2 TheSpurral M2 13B	13B	Q5_K_S	76.32	31.43	371
146	MLewd V2-1 015 13B	13B	Q4_K_S	75.96	30.13	387
147	Eileithyia 13B	13B	Q5_K_M	73.39	27.70	440
148	Synthia v1.3 7B	7B	Q5_K_M	78.71	32.46	333
149	UndiMix v4 13B	13B	Q5_K_M	79.02	32.24	332
150	Chupacabra 7B	7B	Q8_0	87.64	26.76	299
151	Augmental V1.50 A 13B	13B	Q5_K_M	77.43	29.72	375
152	ReMM v1 LRPSGPT 2Char 13B	13B	Q5_K_M	74.89	32.40	373
153	PsyFighter2 13B	13B	Q5_K_M	78.47	27.40	387
154	Emerhyst 13B	13B	Q5_K_M	78.44	25.58	404
155	Tess Medium 200K V1.0 34B	34B	Q4_K_M	76.72	29.98	375
156	MythoMix 13B	13B	Q5_K_M	76.34	29.51	384
157	Mistral Phibrarian 32K 7B	7B	Q5_K_M	83.54	29.50	307
158	Eileithyia 7B	7B	Q8_0	83.58	27.88	323
159	LimaBean 13B	13B	Q5_K_M	77.88	27.00	391
160	MLewdBoros LRPSGPT 2Char 13B	13B	Q5_K_M	76.78	28.83	382
161	ReMM v2.2 13B	13B	Q5_K_M	79.63	31.11	327
162	Airoboros M 3.1.2 7B	7B	Q5_K_M	84.02	32.09	270
163	Speechless Mistral Dolphin Orca Platypus Samantha 7B	7B	Q5_K_M	84.36	27.96	310
164	UndiMix v2 13B	13B	Q5_K_M	79.50	32.22	316
165	Vigostral Chat 7B	7B	Q5_K_M	81.38	30.52	314
166	LLaMA 2 TiefighterLR 13B	13B	Q5_K_M	74.47	26.75	424
167	Xwin LM V0.2 13B	13B	Q5_K_M	79.27	28.46	355
168	Euryale L2 70B	70B	Q4_K_M	79.94	27.13	362
169	Camel Platypus2 70B	70B	Q4_K_M	82.90	26.49	337
170	Tulpar Limarp 7B	7B	Q5_K_M	78.48	28.28	364
171	Yi GiftedConvo Merged 34B	34B	Q4_K_M	84.65	28.19	299
172	PsyMedRP V1 13B	13B	Q5_K_M	76.76	26.07	404
173	SynthiAthena V2 13B	13B	Q5_K_M	78.89	30.92	328
174	Mistralic 1 7B	7B	Q5_K_M	80.67	28.41	335
175	Uncensored Jordan 33B	33B	Q5_K_M	87.37	31.75	227
176	Phind CodeLlama V2 34B	34B	Q4_K_M	85.07	26.77	302
177	Lewd Sydney 20B	20B	Q4_K_S	82.58	27.49	320
178	Mistral CC Air 11B	11B	Q5_K_M	79.25	29.19	337
179	Chronoboros 33B	33B	Q4_K_M	74.93	31.21	360
180	Magpie 13B	13B	Q5_K_M	78.02	29.06	350
181	Airoboros Mistral 2.2 7B	7B	Q5_K_M	80.23	32.39	290
182	Inkbot 4k 13B	13B	Q4_K_M	77.20	28.14	367
183	UndiMix V4 13B	13B	Q5_K_M	79.01	30.81	319
184	Mistral RP 0.1 7B	7B	Q5_K_M	77.86	29.05	349
185	MLewd 13B	13B	Q5_K_M	74.71	32.08	348
186	ReMM v2.1 13B	13B	Q5_K_M	77.29	31.89	322
187	Augmental V1.50 B 13B	13B	Q5_K_M	76.91	28.75	359
188	Thorns 13B	13B	Q5_K_M	79.08	35.10	268
189	airoboros L2 3.1 13B	13B	Q5_K_M	79.86	31.35	299
190	Airochronos 33B	33B	Q4_K_M	75.00	31.83	342
191	LlongOrca 16K 13B	13B	Q5_K_M	78.47	25.83	368
192	Guanaco 65B	65B	Q4_K_M	78.85	29.00	330
193	Slerpeno 13B	13B	Q5_K_M	74.74	32.92	330
194	MLewd V2-1 050 13B	13B	Q4_K_S	74.13	28.69	381
195	ReMM 0.65 SLERP 13B	13B	Q5_K_M	76.25	30.18	342
196	Chronos V2 70B	70B	Q4_K_M	76.67	27.76	362
197	Athena v1 13B	13B	Q5_K_M	74.58	30.74	352
198	OpenBuddy Zephyr V14.1 7B	7B	Q5_K_M	74.35	30.75	352
199	ReMM Lion 13B	13B	Q5_K_M	76.02	27.85	363
200	MythoMakiseMerged 13B	13B	Q5_K_M	77.02	27.77	351
201	WizardLM V1.0 70B	70B	Q4_K_M	85.99	26.56	269
202	Airoboros L2 2.2.1 13B	13B	Q5_K_M	75.19	31.09	335
203	Nanbeige Chat 16B	16B	Q4_K_M	80.54	30.05	289
204	Airoboros L2 3.0 13B	13B	Q5_K_M	75.97	29.32	345
205	Mistral SynthIAirOmniMix 11B	11B	Q5_K_M	79.14	29.18	310
206	UndiMix v1 13B	13B	Q5_K_M	77.78	30.73	307
207	Vicuna V1.5 16K 13B	13B	Q5_K_M	78.64	28.54	321
208	Airoboros L2 C 3.1.2 70B	70B	Q4_K_M	87.88	27.27	236
209	ReMM v2 Variant 13B	13B	Q5_K_M	78.05	30.61	304
210	Nous Hermes 13B	13B	Q5_K_M	81.22	31.81	257
211	LimaRP V2 LLaMA 2 70B	70B	Q3_K_M	74.97	24.06	403
212	Mistral CC Air RP 11B	11B	Q5_K_M	79.30	27.80	317
213	Airoboros 2.1 13B	13B	Q5_K_M	71.16	28.94	391
214	Airoboros Creative lmoe 13B	13B	Q5_K_M	71.22	29.61	382
215	Mistral ClaudeLimaRP v3 7B	7B	Q5_K_M	73.78	27.59	375
216	Tess XS V1.0 7B	7B	Q8_0	78.24	30.41	294
217	Thespis Mistral V0.6 7B	7B	Q6_K	82.64	30.70	243
218	Spicyboros 2.2 13B	13B	Q4_K_M	70.58	28.50	389
219	CollectiveCognition V1.1 Mistral 7B	7B	Q5_K_M	85.28	27.02	246
220	Platypus 2 70B	70B	Q4_K_M	78.04	26.06	330
221	AstraMix 7B	7B	Q5_K_M	72.52	28.74	359
222	MistRP Airoboros 7B	7B	Q5_K_M	80.48	30.61	254
223	Airoboros 2.2 13B	13B	Q5_K_M	70.45	28.91	378
224	LLaMA-2 LoRA Assemble 7B	7B	Q5_K_M	77.82	30.07	287
225	Opus V0 70B	70B	Q4_K_M	79.32	24.22	333
226	MegaMix A1 13B	13B	Q5_K_M	76.53	29.22	309
227	Mythalion 13B	13B	Q5_K_M	74.39	29.05	332
228	AshhLimaRP Mistral 7B	7B	Q5_K_M	74.07	24.68	380
229	Mistral OpenOrca oasst top1 2023-08-25 V1 7B	7B	Q5_K_M	78.27	25.68	324
230	Airoboros L2 3.1.1 13B	13B	Q5_K_M	75.98	27.83	324
231	L2 TheSpurral V2 13B	13B	Q5_K_S	71.22	30.53	345
232	LLaMA 2 Chat 70B	70B	Q4_K_M	86.76	24.82	241
233	MegaMix T1 13B	13B	Q5_K_M	76.36	29.70	298
234	Yi 200K Airo Claude Puffin 6B	6B	Q6_K	71.43	28.39	364
235	Yarn Mistral 64k 7B	7B	Q5_K_M	75.03	27.89	331
236	Zarafusionex 1.1 7B	7B	Q5_K_M	71.08	28.61	365
237	TerraMix 16K 13B	13B	Q5_K_M	74.97	25.92	352
238	LLaMA 2 Chat Uncensored 70B	70B	Q4_K_M	75.19	30.22	302
239	Vicuna 33B	33B	Q4_K_M	79.25	30.58	254
240	Dans TotSirocco 7B	7B	Q5_K_M	79.47	27.54	283
241	OpenChat 3.5 16k 7B	7B	Q5_K_M	75.17	28.41	319
242	Yarn Mistral 128k 7B	7B	Q5_K_M	75.29	26.17	341
243	PetrolLM Claude Chat 7B	7B	Q8_0	71.81	32.67	308
244	Marcoroni 7B	7B	Q5_K_M	75.69	29.34	301
245	Nous Hermes LLaMA-2 13B	13B	Q5_K_M	79.25	31.07	239
246	Nous Capybara V1.9 7B	7B	Q5_K_M	73.07	29.94	316
247	Airoboros L2 GPT4 m2.0 13B	13B	Q5_K_M	76.23	32.82	251
248	Vicuna V1.5 13B	13B	Q5_K_M	76.86	27.95	293
249	Augmental 13B	13B	Q5_K_M	71.20	26.50	368
250	GradientPutri MegaMix S1 13B	13B	Q5_K_S	73.27	29.39	312
251	Chronos Hermes v2 13B	13B	Q5_K_M	72.39	28.38	332
252	Arithmo Mistral 7B	7B	Q5_K_M	77.02	29.47	271
253	Airoboros GPT4 2.0 LLaMA-2 13B	13B	Q5_K_M	73.61	32.58	274
254	Huginn v4.5 13B	13B	Q5_K_M	70.00	26.10	381
255	Huginn v3 13B	13B	Q5_K_M	70.00	26.10	381
256	Huginn v4 13B	13B	Q5_K_M	70.00	26.10	381
257	Merak V4 PROTOTYPE6 7B	7B	Q5_K_M	76.22	29.96	270
258	Thespurral V1 13B	13B	Q5_K_M	69.55	30.73	332
259	MythoLogic 13B	13B	Q5_K_M	75.22	31.57	263
260	Airoboros 2.2.1 Mistral 34B	34B	Q4_K_S	78.83	25.41	290
261	Airoboros 3.1.2 33B	33B	Q4_K_M	71.35	29.96	319
262	Tigerbot Chat V4 70B	70B	Q4_K_M	76.96	30.36	255
263	Kiwi 7B	7B	Q6_K	68.80	30.66	338
264	Platypus Yi 34B	34B	Q4_K_M	76.42	27.58	290
265	SynthIA V2.0 16k 7B	7B	Q6_K	77.89	27.39	275
266	LimaRPv3 LLaMA 2 70B	70B	Q3_K_M	74.92	23.61	346
267	LimaRP V3 LLaMA 2 13B	13B	Q6_K	67.30	25.13	410
268	Mistral LimaRP 0.75w 7B	7B	Q5_K_M	73.14	26.17	335
269	LLaMA 2 Chat LimaRP V2 Merged 13B	13B	Q5_K_M	75.95	29.76	266
270	Airoboros C 2.2.1 34B	34B	Q4_K_M	79.44	24.97	278
271	ANIMA Phi Neptune Mistral 7B	7B	Q5_K_M	73.05	30.07	291
272	Mistral Trismegistus 7B	7B	Q5_K_M	79.05	32.91	191
273	Prometheus V1.0 13B	13B	Q6_K	75.34	25.55	308
274	Pygmalion 2 SuperCOT 13B	13B	Q5_K_M	77.70	28.03	255
275	LLaMA 65B	65B	Q4_K_M	74.61	23.94	331
276	Kimiko Mistral 7B	7B	Q5_K_M	74.18	25.68	317
277	Opus V0 7B	7B	Q8_0	77.26	25.14	290
278	Mistral v0.1 7B	7B	Q5_K_M	72.67	28.81	298
279	JudgeLM V1.0 33B	33B	Q5_K_M	72.85	28.05	304
280	Dans AdventurousWinds Mk2 7B	7B	Q5_K_M	70.35	25.55	357
281	WizardLM v1.2 13B	13B	Q4_0	75.81	25.28	300
282	Kuchiki 7B	7B	Q5_K_M	64.09	30.90	364
283	LimaRPv3 Yi 34B	34B	Q4_K_M	62.82	25.83	429
284	Claude 2 Alpaca 13B	13B	Q5_K_M	70.91	29.42	305
285	Teknium OpenHermes 13B	13B	Q5_K_S	71.81	31.34	275
286	Airolima Chronos Grad L2 13B	13B	Q5_K_M	70.43	28.42	319
287	Uncensored Jordan 13B	13B	Q5_K_M	79.92	27.42	229
288	Thespis V0.5 13B	13B	Q5_K_M	72.61	30.23	276
289	Zarablend 1.1 7B	7B	Q5_K_M	65.62	33.09	319
290	Zarafusionex 1.2 7B	7B	Q5_K_M	70.53	24.82	355
291	Frank Uncensored 13B	13B	Q5_K_M	76.04	30.81	228
292	Zarablend 7B	7B	Q5_K_M	64.37	30.72	352
293	Prometheus V1.0 13B	13B	Q5_K_M	75.23	23.50	313
294	Opus V0 7B	7B	Q5_K_M	77.91	24.91	268
295	PetrolLM 7B	7B	Q5_K_M	74.81	23.71	313
296	Stheno Chat 13B	13B	Q5_K_M	74.94	27.64	268
297	Medusa 1.1 7B	7B	Q5_K_M	71.06	29.98	284
298	UltraLM V2.0 13B	13B	Q5_K_M	71.92	29.28	282
299	Spicyboros 2.2 7B	7B	Q5_K_M	66.38	31.94	311
300	KAI Beta 7B	7B	Q5_K_M	72.69	28.22	283
301	Astrid Mistral 7B	7B	Q5_K_M	72.69	28.22	283
302	Chronos 33B	33B	Q4_K_M	72.46	24.20	328
303	Airoboros L2 3.0 7B	7B	Q5_K_M	67.75	29.36	323
304	StableBeluga 7B	7B	Q5_K_M	73.27	26.66	291
305	LLaMA 2 70B	70B	Q4_K_M	74.79	22.70	317
306	OpenBuddy Mistral v13 7B	7B	Q5_K_M	72.53	31.32	249
307	LLaMA 2 Arguments 7B	7B	Q5_K_M	76.80	29.98	218
308	Samantha Mistral 7B	7B	Q5_K_M	76.16	27.49	251
309	Hermes LimaRP 7B	7B	Q5_K_M	62.67	28.32	383
310	Airoboros GPT4 1.4.1 13B	13B	Q5_K_M	69.09	28.15	316
311	Pygmalion 2 SuperCOT2 13B	13B	Q5_K_M	75.76	30.81	217
312	Mistral PetroLimaRP v3 12B	12B	Q5_K_M	61.14	27.66	405
313	Mistral Claude Chat 7B	7B	Q5_K_M	74.83	30.14	233
314	Thespis V0.6 13B	13B	Q5_K_M	76.12	29.38	227
315	MegaMix S1 13B	13B	Q5_K_M	72.97	25.83	296
316	Barcenas 13B	13B	Q5_K_M	77.05	23.94	271
317	Baslisk V0.2 7B	7B	Q6_K	68.92	28.82	305
318	Airoboros GPT4 2.0 LLaMA-2 7B	7B	Q5_K_M	73.66	31.95	220
319	Mistral Airoboros V0.1 11B	11B	Q8_0	69.52	28.59	299
320	Holomax 13B	13B	Q5_K_M	65.03	25.14	383
321	Basilisk V0.2 7B	7B	Q5_K_M	68.55	30.59	287
322	Kimiko V2 13B	13B	Q5_K_M	68.02	27.73	323
323	Pygmaltion 2 SuperCOT weighted 13B	13B	Q5_K_M	70.92	29.29	275
324	Phind CodeLlama V1 34B	34B	Q4_K_M	79.10	26.49	217
325	Nanbeige Chat 32K 16B	16B	Q4_K_M	70.82	27.42	294
326	Hesperus V1 L2 13B	13B	Q5_K_M	68.11	28.67	309
327	Medusa 1.1 L2 7B	7B	Q6_K	70.91	27.69	289
328	SuperCOT L2 13B	13B	Q5_K_M	70.78	29.42	268
329	Thespis V0.3 13B	13B	Q5_K_M	67.59	28.43	312
330	Wizard Vicuna Uncensored 13B	13B	Q5_K_M	74.08	29.57	231
331	LLaMA 2 Chat 13B	13B	Q5_K_M	74.29	27.55	250
332	Dans AdventurousWinds 7B	7B	Q5_K_M	72.38	24.93	298
333	LosslessMegaCoder Mini 7B	7B	Q5_K_M	69.96	30.37	263
334	MoMo V1.1 70B	70B	Q4_K_M	67.54	26.82	326
335	Airoboros C 2.2 34B	34B	Q4_K_M	74.48	24.12	281
336	EM German V01 13B	13B	Q5_K_M	68.03	26.36	325
337	Pygmalion 2 13B	13B	Q5_K_M	69.17	29.05	284
338	Vicuna v1.5 16K 7B	7B	Q5_K_M	71.41	31.35	234
339	Fireflx v1.2 13B	13B	Q5_K_M	69.25	28.70	285
340	LLaMA 30B	30B	Q4_K_M	67.12	28.25	311
341	Chronolima Airo Grad L2 13B	13B	Q5_K_M	70.09	27.44	288
342	Nanbeige Base 32K 16B	16B	Q4_K_M	65.32	28.34	328
343	AgentLM 7B	7B	Q5_K_M	77.02	29.45	190
344	Zarablend MX 7B	7B	Q5_K_M	65.60	29.20	313
345	Airoboros 2.1 7B	7B	Q5_K_M	63.29	28.25	346
346	Nous Capybara 7B	7B	Q5_K_M	63.69	33.01	291
347	LLaMA-2 Silverlin. Verilog 7B	7B	Q4_K_M	77.03	29.56	186
348	Airoboros L2 2.2 7B	7B	Q5_K_M	67.07	29.61	288
349	Tsukasa LimaRP 13B	13B	Q5_K_M	68.43	20.93	365
350	Skywork Airoboros Test 13B	13B	Q4_K_M	70.60	22.43	325
351	Saiga 2 13B	13B	Q5_K	66.53	28.01	307
352	Airoboros 2.2 7B	7B	Q5_K_M	67.10	29.55	284
353	Mistral Instruct v0.1 7B	7B	Q5_K_M	67.07	29.81	279
354	Kuchiki 1.1 7B	7B	Q5_K_M	62.83	29.69	325
355	KAI Instruct 7B	7B	Q5_K_M	67.11	30.24	273
356	Luna AI LLaMA-2 Uncensored 7B	7B	Q5_K_M	67.13	32.85	245
357	Claude 2 Alpaca 7B	7B	Q5_K_M	67.78	30.95	258
358	Vicuna v1.5 7B	7B	Q5_K_M	72.72	29.03	226
359	Python Code 13B	13B	Q5_K_M	71.04	27.93	253
360	MistRP v1.1 7B	7B	Q8_0	70.37	26.05	279
361	Thespis V0.4 13B	13B	Q5_K_M	69.86	27.96	264
362	CAMEL Combined Data 33B	33B	Q4_K_M	67.21	29.38	277
363	Befenghuang Vigogne 2 Chat 7B	7B	Q5_K_S	69.80	26.82	276
364	Cat V1.0 13B	13B	Q5_K_M	68.13	26.12	301
365	LlongOrca 16K 7B	7B	Q5_K_M	68.52	25.52	302
366	LLaMA 2 Chat 7B	7B	Q5_K_M	74.44	28.89	203
367	Python Code 33B	33B	Q4_K_M	77.21	27.18	191
368	Skywork Spicyboros 3.1 13B	13B	Q4_K_M	67.55	26.33	300
369	MedLLaMA-2 Chat 7B	7B	Q5_K_S	69.89	26.49	273
370	Nanbeige Base 16B	16B	Q4_K_M	64.88	26.11	330
371	Skywork Airo Claude Pippa Puffin 13B	13B	Q4_K_M	71.44	22.44	298
372	Mistral Airoboros RP V1 11B	11B	Q6_K	72.46	22.43	279
373	Xwin LM V0.1 7B	7B	Q5_K_M	65.09	35.85	214
374	Mistral Ita 7B	7B	Q5_K_M	75.58	25.91	207
375	Airoboros GPT4 1.4.1 7B	7B	Q5_K_M	63.90	31.73	268
376	AgentLM 13B	13B	Q5_K_M	72.63	28.64	206
377	Free Sydney V2 13B	13B	Q5_K_M	74.72	18.61	287
378	Ziya Coding V1.0 34B	34B	Q4_K_M	75.44	21.74	246
379	Airoboros 2.1 YaRN 64K 13B	13B	Q5_K_M	62.12	28.13	319
380	Leo Hessianai Chat 7B	7B	Q5_K_M	67.73	29.42	244
381	Yi 200K 6B	6B	Q5_K_M	68.13	25.12	285
382	Guanaco Uncensored 7B	7B	Q5_K_M	63.16	28.64	299
383	Airoboros L2 2.2.1 7B	7B	Q5_K_M	65.94	26.62	290
384	Yi 200K 6B	6B	Q6_K	67.97	24.04	295
385	Airoboros GPT4 m2.0 LLaMA-2 7B	7B	Q5_K_M	69.69	30.00	212
386	Yi 200K 34B	34B	Q5_K_M	60.97	27.05	334
387	Yi 200K LLaMAfied 34B	34B	Q5_K_M	60.97	27.05	334
388	Barcenas Mistral 7B	7B	Q5_K_M	69.23	28.30	233
389	Saiga 2 7B	7B	Q5_K	64.28	28.78	278
390	LLaMA-2 Mistral 13B	13B	Q5_K_M	63.43	26.59	309
391	Tsukasa Limarp 7B	7B	Q5_K_M	65.85	21.52	337
392	Mistral Pygmalion 7B	7B	Q5_K_M	64.50	26.55	297
393	Deacon 34B	34B	Q4_K_M	59.01	29.71	321
394	Krakowiak 7B	7B	Q4_K_M	63.13	26.07	315
395	Pygmalion 2 7B	7B	Q5_K_M	64.76	27.18	285
396	MedLLama 7B	7B	Q5_K_M	70.60	27.18	219
397	Guanaco Uncensored 13B	13B	Q5_K_M	62.92	28.89	282
398	ZaraRP 1.1 L2 7B	7B	Q5_K_M	71.74	26.00	217
399	Kimiko 7B	7B	Q5_K_M	60.79	24.62	347
400	Chinese Alpaca 2 13B	13B	Q5_K	69.67	25.89	235
401	Prometheus V1.0 7B	7B	Q8_0	69.44	23.30	264
402	Mistral NSFWSTORY LoRA 7B	7B	Q5_K_M	68.93	21.95	282
403	LLaMA 2 13B	13B	Q5_K_M	63.36	27.35	272
404	Samantha Mistral Instruct 7B	7B	Q5_K_M	61.82	29.73	262
405	Yi 6B	6B	Q6_K	67.98	21.69	279
406	RPGuild ChatML 13B	13B	Q5_K_M	63.24	23.64	307
407	Taiwan LLM V2.0 Chat 13B	13B	Q5_1	63.22	26.29	279
408	EM German V01 7B	7B	Q5_K_M	63.92	26.58	263
409	LLaMA-2 Coder 7B	7B	Q5_K_M	61.96	27.01	279
410	Rinna Youri Chat 7B	7B	Q5_K_M	70.46	22.56	235
411	LLaMA-2 PeanutButter v19 R8 7B	7B	Q5_K_M	61.12	25.67	294
412	LLaMA-2 Mistral 7B	7B	Q5_K_M	60.73	25.38	301
413	ELYZA Jp LLaMA-2 Instruct 7B	7B	Q5_K_M	69.42	29.17	164
414	Tulu 7B	7B	Q5_K_M	75.18	21.44	185
415	LLaMA 2 7B	7B	Q5_K_M	60.86	24.56	302
416	Skywork Base 13B	13B	Q5_K_M	65.81	21.10	286
417	Rinna Youri Instruction 7B	7B	Q5_K_M	72.36	19.42	233
418	Medusa 1.3 7B	7B	Q5_K_M	62.86	22.66	296
419	Yi 34B	34B	Q4_K_M	57.91	27.45	295
420	Typly Pigeon 7B	7B	Q4_K_M	61.85	24.14	288
421	Mimicra V1 13B	13B	Q5_K_M	68.30	19.63	263
422	LLaMA-2 Galleon 7B	7B	Q5_K_M	65.46	25.47	215
423	Frank Uncensored 7B	7B	Q5_K_M	61.36	28.91	219
424	WizardLM V1.0 Uncensored 7B	7B	Q5_K_M	61.44	24.20	259
425	Chinese LLaMA 2 13B	13B	Q5_K	58.98	22.11	304
426	MiniChat 3B 3B	3B	Q8_0	60.71	24.77	255
427	Ganchengguang Yoko Japanse v0 7B	7B	Q5_K_S	61.93	26.79	215
428	LLaMA 13B	13B	Q5_K_M	62.01	24.25	238
429	Wizard Vicuna Uncensored 7B	7B	Q5_K_M	61.50	24.81	235
430	Rinna Youri 7B	7B	Q5_K_M	57.01	21.95	306
431	LLaMA-2 Instruct 32K 7B	7B	Q5_K_M	60.81	20.81	275
432	ELYZA Jp LLaMA-2 7B	7B	Q5_K_M	62.34	28.02	174
433	Leo Mistral Hessianai Chat 7B	7B	Q5_K_M	67.65	25.64	141
434	MAmmoTH 7B	7B	Q5_K_M	59.44	25.42	227
435	Chinese Alpaca 2 7B	7B	Q6_K	58.94	29.29	187
436	Chinese Alpaca 2 7B	7B	Q5_K_S	59.04	27.17	182
437	Marx V2 3B	3B	Q4_1	50.47	22.92	313
438	CodeLLaMA Instruct 7B	7B	Q5_K_M	62.18	19.39	223
439	ALMA 13B	13B	Q6_K	59.92	25.36	182
440	Uncensored Jordan 7B	7B	Q5_K_M	62.20	23.61	173
441	WizardLM Uncensored 7B	7B	Q5_K_M	55.70	32.57	142
442	Nous Yarn 64K 7B	7B	Q5_K_M	55.98	21.21	255
443	Pandalyst V1.0 13B	13B	Q5_K_M	65.73	16.98	192
444	CodeLLaMA 7B	7B	Q5_K_M	57.78	21.41	229
445	MiniMA 3B 3B	3B	Q8_0	56.47	22.05	234
446	LLaMA-2 32K 7B	7B	Q5_K_M	61.02	17.02	229
447	Open LLaMA 13B	13B	Q5_K_M	61.16	20.10	193
448	ELYZA Japanese LLaMA 2 Fast 7B	7B	Q6_K	62.16	21.17	167
449	Chinese LLaMA-2 7B	7B	Q5_K	59.02	19.49	216
450	ALMA Pretrain 7B	7B	Q5_K_M	57.56	22.48	199
451	Mamba GPT v4 3B	3B	Q5_1	49.43	23.27	276
452	OpenLLaMA v2 7B	7B	Q5_K_M	48.24	21.86	301
453	Nous Yarn 128K 7B	7B	Q5_K_M	54.82	20.88	239
454	WizardCoder Python V1.0 7B	7B	Q5_K_M	57.26	18.82	235
455	LLaMA 7B	7B	Q6_K	59.53	18.20	216
456	TinyLlama 1.1B Chat V0.3 1B	1B	Q5_K_M	52.81	19.91	265
457	Sheared LLaMA 2 2B	2B	Q5_K_M	54.01	18.66	256
458	Guanaco 7B	7B	Q5_K_M	56.59	22.32	188
459	OpenLLaMA 7B	7B	Q5_K_M	56.29	21.05	196
460	Deacon 3B	3B	Q5_0	54.24	21.83	208
461	ALMA 7B	7B	Q6_K	57.51	20.35	187
462	Vicuna CoT 7B	7B	Q5_K_M	56.57	22.84	169
463	Bling Sheared LLaMA 2 0.1 1B	1B	Q8_0	53.98	17.29	254
464	TinyLlama 1T OpenOrca 1B	1B	Q5_K_M	56.11	16.46	236
465	Claire 0.1 7B	7B	Q4_0	56.27	21.58	178
466	LLaMA-2 KO Chat 7B	7B	Q5_1	57.29	18.75	195
467	OpenLLaMA 3B	3B	Q5_1	53.17	20.26	222
468	Airoboros M 3.0 7B	7B	Q5_K_M	60.40	14.64	202
469	Nucleus Token 500B 22B	22B	Q4_K_M	53.81	20.81	206
470	CodeLLaMA Python 7B	7B	Q5_K_M	56.54	21.24	168
471	Shearedplats 2 V1 2B	2B	Q4_0	54.70	18.62	208
472	Bling Sheared LLaMA 2 0.1 2B	2B	Q8_0	52.80	18.38	226
473	Sheared LLaMA 2 1B	1B	Q8_0	52.80	19.54	208
474	Chinese LLaMA 2 7B	7B	Q6_K	56.17	17.26	189
475	OpenLLaMA v2 3B	3B	Q5_0	48.65	20.41	233
476	Puma 3B	3B	Q4_1	47.89	24.85	190
477	Gorilla 7B	7B	Q5_K_M	60.02	10.38	203
478	Open Cabrita 3B	3B	Q5_1	53.59	17.36	191
479	TinyAiroboros 2.2.1 1B	1B	Q6_K	52.78	15.86	213
480	Pandalyst V1.2 7B	7B	Q5_K_M	59.87	11.61	178
481	TinyLlama Chat V0.4 1B	1B	Q8_0	52.79	16.21	195
482	Pandalyst V1.1 7B	7B	Q5_K_M	60.35	11.63	158
483	Smartyplats 1.1b V1 1B	1B	Q8_0	51.29	16.34	157
484	TinyAlpaca V0.1 1B	1B	Q8_0	51.02	14.17	174
485	TinyLLaMA MiniGuanaco 1.5T 1B	1B	Q8_0	51.70	14.81	144
486	Based 7B	7B	Q5_K_M	64.80	6.91	79
487	WizardLM 7B	7B	Q5_K_M	49.35	3.74	245
488	TinyMistral 248M 1B	1B	Q8_0	50.91	2.73	120
489	Chinese Alpaca 2 1B	1B	Q8_0	49.15	5.17	92
490	CyberAgentLM2 Calm 2 Chat 7B	7B	Q5_K_M	51.65	4.52	43
491	Chinese LLaMA 2 1B	1B	Q8_0	47.05	3.68	74
492	PY007 TinyLLaMA Chat v0.2 1B	1B	Q8_0	53.84	0.20	3
493	Giraffe V2 32k 13B	13B	Q5_K_M	51.76	0.00	0
494	Azale AI Starstreak Alpha 7B	7B	Q5_K_S	51.22	0.22	3
495	Yi 6B 6B	6B	Q6_K

About Quantization

My main advice is: Stay away from Q2_K and Q3_K_S if you can help it! The quality loss of those is just too big! Go for Q4_K_M or Q5_K_M of the models! Generally: Prefer K_M or K_S over the bare quantizations such as Q4_0, Q4_1, Q5_0 or Q5_1.

Ayumi ERP Rating Archive

If you want to look at the old benchmarks:

Technical Details of the ALC-IQ3 and ERP3 Benchmark

In this section I share some of the technical details about this benchmark. I also want to document the possible flaws of the results in this ranking.

If you have better ideas how to rate or rank models for suitability in a role play context. I urge you to:

Try your ideas out. Download some inference engine like eg. llama.cpp, oobabooga's text-generation-webui or kobold.cpp. Or even try out the llama.cpp based prompt_runner I built for this benchmark: WeirdConstructor's llama.cpp benchmark prompt_runner - https://github.com/WeirdConstructor/llama.cpp/tree/prompt_runner/examples/prompt_runner
Write a few scripts in your preferred scripting language.
Run your models through your benchmark.
And publish your results, even if you just dump them in some paste bin or here on http://rentry.co http://rentry.org

I will gladly link any other benchmark!

Alternative benchmarks or rankings:

If you want to base your work on this, feel free to cite this as:

⎗
✓
@misc{weirdconstruct2023-ayumi-llm-role-play-alc-iq-3-erp-3-ranking,
  title         = {Ayumi LLM Role Play & ERP Ranking - ALC-IQ and ERP Score Version 3},
  author        = {Weird Constructor},
  year          = {2023},
  note          = {Accessed on 04.11.2023}
  howpublished  = {\url{https://rentry.co/ayumi_erp_rating}},
}

Ayumi LLM Character IQ Version 3 - ALC-IQ3

The third version of the ALC-IQ (the second one was never released, because it was bad): With some inspiration from @gj on TheBloke's Discord, I developed a personality test framework based upon llama.cpp. In ALC-IQ version 1 I used an agreement rating from 1 (disagree) to 5 (agree). The ALC-IQ3 simplified this a lot and just lets the character answer with Yes or No. In combination with the newly added BNF grammar based sampling mechanism, I developed my own inference frontend around the core API of llama.cpp. The benchmark "prompt runner" can be found on my GitHub: GitHub fork of llama.cpp with the prompt runner tool.

The ALC-IQ3 is actually a collection of questions a character has to answer about themself. It's not just Ayumi anymore, but bascially "Ayumi and Friends". There are actually 5 character cards in the ALC-IQ3 used.
The prompt for the ALC-IQ consists of a setting where a specific character has to rate how if they agree with a specific statement about them.
They are asked to answer with either Yes or No in a single character form "Y" or "N".
To limit the sampling of the next token after the prompt, a BNF grammar is specified:

⎗

1 2	root ::= (" " word) \| word word ::= [YNyn] \| "Yes" \| "No" \| "yes" \| "no"

This is the prompt that is generated from a character card (newlines inserts at some places for readability here):

⎗

Write <CharacterName>'s next reply in a role play chat between Doctor Smith and <CharacterName>.
This is how <CharacterName> should talk:
<Example Messages>
<CharacterName>'s Persona: <Personality>
<CharacterName>'s personality: <Summary>

Then the chat between Doctor Smith and <CharacterName> begins.
<CharacterName>: *<CharacterName> is sitting in Doctor Smith's comfortable office at the local university. <CharacterName> is
here to take part in a survey for an important study. The study consists of personal questions, which <CharacterName> loves to answer.*
Doctor Smith: Hello <CharacterName>! Thank you very much for taking part in this important study.
 Please answer as truthful about yourself as possible, your answers will remain private. Let me explain you how the test is structured. 
 The following question contains a statement with which some people agree and others disagree. 
 Please answer if you agree or disagree with the given statements - how much the statement reflects how you feel or think. 
 Your response must be restricted to a yes if you agree, or a no if you disagree. 
 Please write down the letter "Y" if you agree, and the letter "N" if you disagree: 
 <CharacterName>: *<CharacterName> understands what Doctor Smith is saying and nods* Okay, I understand. I will answer truthful and honest. 
 would like to to start with the first statement. *Doctor Smith gives <CharacterName> a piece of
 paper with the statement. <CharacterName> reads the first statement:* "<TRUEFACT>"
 *<CharacterName> writes down the letter of the choice:* Y
Doctor Smith: Ok, next statement. *Doctor Smith hands <CharacterName> the next statement.*
<CharacterName>: *<CharacterName> reads the next statement:* "<STATEMENT>" *<CharacterName> thinks about
 it and writes down the letter of the choice:*

The response, filtered using the BNF grammar from above then yields a set of tokens, with their probabilities ran through softmax. Which results in this:

⎗
✓
 "tokens": [
    [ " Y",   0.7747981548309326 ],
    [ " N",   0.2129267007112503 ],
    [ " Yes", 0.007864524610340595 ],
    [ "Y",    0.002205024240538478 ],
    [ " y",   0.0009446843178011477 ],
    [ "N",    0.0005157442064955831 ],
    [ " yes", 0.0003263621183577925 ],
    [ " No",  0.0002862309629563242 ],
    [ " Ye",  5.029883323004469e-05 ],
    [ " n",   2.632655559864361e-05 ],
    [ "Yes",  2.4537455828976817e-05 ],
    [ "y",    1.9077124306932092e-05 ]
]

The tokens are uppercased and added to two different tokens "Y" and "N":

⎗
✓
[
    [ "Y",   0.7862326635313366 ],
    [ "N",   0.21375500243630086 ]
]

If the question is correctly answered with a "Y" then the corresponding probability is taken. Otherwise the "N" probably is taken.
The probabilities are then averaged and multiplied by 100. Resulting in the ALC-IQ3 of the model.

The ranking table is then sorted by weighted sum of the ALC-IQ3, the ERP3 Score and the Var Score.

Known Flaws of the ALC-IQ

The ALC-IQ is still prone to problems:

The result has still some degree of randomness in them, less good models can sometimes pick the right answer by accident. I try to counteract this by adding more questions in future though.
Bad questions in the benchmark can lead to a model not knowing which answer to pick, introducing even more randomness in the results.
The ALC-IQ does not reflect how well the LLM can stay in character in a longer conversaion.
The ALC-IQ does not determine any creative writing abilities of the LLM.
The ALC-IQ covers intelligence only in one specific and narrow scenario, and not across a range of possible role play chat situations.
The ALC-IQ is usually tested only with a rather short prompt, rarely exceeding 1024 tokens, it does not cover the whole 2048 context of LLaMA 1 or the 4096 of LLaMA 2, let alone the extended context's of 8k, 16k, ...

Despite all that, I think the ALC-IQ is a big improvement over the old ranking which purely relied on the ERP score. The runtime of the benchmark is within reason for the hardware that is available to me, which is also an important factor for running and providing these benchmark results.

ERP3 Score and ERP3 Variety Score

The previous versions of the ERP Score consisted only of prompts of Ayumi and one other character. There are now multiple characters involved in generating the ERP responses. Also the character card of Ayumi has been adjusted to be much more willing to engage into sex. Also the prompt has been tuned to tell the LLM to generate more lewd responses. The goal was to remove ambiguity and let the models generate as lewd content as possible.

The list of the lewd words of the ERP3 Score has been extended a bit too, to include a few less NSFW words too - which still fit into the setting of course.

This is the prompt format used for the ERP3 Score:

⎗

Write <CharacterName>'s next reply in an erotic roleplay chat between Loki and
<CharacterName>. The character actions should be long and detailed,
 with vivid pornographic details and with lots of naugthy words.
<CharacterName>'s Persona: <Personality>
<CharacterName>'s personality: <Summary>
Circumstances and context of the dialogue: <Scenario>

Then the erotic roleplay chat between Loki and <CharacterName> begins. The
character actions should be long and detailed, with vivid pornographic 
details and with lots of naugthy words.
<CharacterName>: <Greeting/First Message>
Loki: *Strips naked and shows off his huge erection* Please give me a good blowjob now.
<CharacterName>:

The responses are then split up into words which are compared with a list of lewd/naugthy words.

For inference llama.cpp is used, for which I built an extra tool to generate responses for multiple prompts and seeds without having to reload the model: https://github.com/WeirdConstructor/llama.cpp/tree/prompt_runner/examples/prompt_runner
The following sampler settings are used:
- The max length of the response is limited to 250 tokens. (-n 250)
- Context size 2048
- Repeat penality is set to 1.1 and the last 64 tokens are penalized. (--repeat-last-n 64 --repeat-penalty 1.1)
- Top-K and Top-P are disabled (--top-k 0 --top-p 1.0)
- Tail Free Sampling is used with z=0.95: (--tfs 0.95)
- The temperature is set to 0.9 (--temp 0.9)
- Some layers are offloaded to the GPU, which sometimes changes the results slightly because of floating point rounding differences
One prompt format is tested (see above)
4 Character cards are used with example messages.
And the same 4 character cards are also used without example messages. The purpose of this is, to limit the impact of badly written example messages and let the model come up with their own ways to let the character formulate their answers.
10 pre picked seeds are tested for each prompt format.
The resulting 80 responses are then analyzed for the number of lewd words and also with a very basic regex based algorithm for non consent.
The individual ERP3 score of a response is then the number of lewd word in relation to the word count of the response. Responses are stripped off incomplete sentences and stop phrases. Responses shorter than 10 words are assigned a score of 0. The ERP3 score is then: erp_score := 100 * (lewd_word_count / word_count) - the word count includes the number of lewd words.
For each prompt format the average of the 80 ERP3 Scores of is calculated, resulting in the ERP3 Score.

This means, the ERP3 Score is the average of the number of lewd word count to word count ratio in the responses (which is limited to 250 tokens). An ERP3 Score of 20.0 means that 20% of the words in a response were lewd. An ERP3 Score of 0.0 means that there were either no lewd words, too short response or no consent was detected (which immediately disqualifies the response to 0.0).

The ERP Variety Score is computed by further analyzing the generated 80 responses from the ERP Score by recording how many different lewd words were generated from all of these 80 responses. This means, it tries to catch the variety of lewd words the model is capable to generate. This means it kind of tries to catch the creativity of the model in erotic scenarios - how many different lewd words it knows of and knows how to use. This is an important part of the ERP Rank now.

Known Flaws of the ERP3 Score and ERP Variety Score

The ERP3 Score and ERP Variety Score analysis is very rudimentary and of course biased by the selection of which words are considered "lewd".
The following things are not reflected by the ERP score:

The ERP score does not reflect if the text response was coherent in context with the conversation/situation.
The ERP score does not reflect if the response was in character.
The ERP score does not reflect how nicely written the response is.
The ERP score does not reflect how creative the response is.
The ERP score does not reflect how well the LLM might go from a normal conversation into a more erotic context.
The ERP score does not detect how erotic the response is if lewd words are not used.
The ERP score is limited to the one format described above.

Further about the ERP Variety Score:

All above mentioned flaws from the ERP score still apply.
Like already stated, the ERP Variety Score is obviously biased by the known lewd words from my list, which might be incomplete.
The ERP Variety Score is still just a rather bluntly applied number to a textual response.
The ERP Variety Score number can only be evaluated in comparison with the other models. There is no known best number for this, but still, the higher the better.

The flaws are accepted by me (weicon) because:

The ERP score can still detect if a model is censored (aka aligned).
My private hardware limitations, which means I have a limited number of responses I can reasonably generate.
I want to test as many GGUF/GGML models as possible.

About Instruction or Chat Prompt Formats

I thought long about how many or which prompt formats to base the ERP score benchmark on. In the previous runs (see the Ayumi ERP Rating Archive and Ayumi ERP Rating Archive 2 ) I tested up to 7 different prompt formats. Testing a dozen different seeds for each prompt format takes a lot of computing time. So I had to find a middle ground.

I observed that the specific instruction/chat prompt format does not make a huge difference actually. Once a LLM got intelligent enough (LLaMA 1 13B, or LLaMA 2 7B), it was able to pick up on almost any pattern rather quickly. At least that was my experience and observation from the benchmarks and the hundreds of hours I spent with chat bots in SillyTavern.
It is really hard to figure out which instruction or chat prompt format a certain fine tune was trained for. The model cards on https://huggingface.co/ are either empty or not contain prompt format details. Only a few people who quantize GGML files take their time and document this. On top of that nearly everyone who fine tunes their model picks their own prompt format. The last straw for me was for instance LLaMA 2 Chat, which came with yet another instruction/chat prompt format.
You can tune and jail break many models by adjusting the prompt and make even censored models spew out lots of lewd stuff. But for this test, I wanted to reflect how the average user is going to chat with these language models.

Originally I used the best 2 performing prompt formats. But in a decision to test more different characters I had to scrap them and just use a vanilla or raw prompt format, without any special instruction formatting.

Who is Ayumi?

Ayumi is a character I made, this character card is basically the base for this test. I removed some of the example messages and replaced the first message with something else to make the LLM go into NSFW ERP a little bit easier. I picked this character, because it's not purposefully made to be lewd, even slightly averse to it.

Ayumi ALC-IQ3 and ERP3 Character Card

https://files.catbox.moe/007oq8.png

⎗
✓
{"name":"Ayumi","description":"Description=( {{char}} is a shy autistic woman that finds relief in her special interests and her sexuality. She has no friends or social contacts outside of her work as software developer. She is in a relationship with {{user}} and lives out her sexuality in the fullest.)\r\n Age=( over thirty years)\r\n Interests=( chemistry, books, collecting minerals, science fiction, sci-fi, anime, electronics, programming, computers, collecting pornography, hentai mangas, watching porn)\r\n Personality=( shy, autistic, asocial, rational, sexually interested, often horny, intelligent, talented, gifted, withdrawn, defensive, argus-eyed, watchful, wary, hesitant, cautious, coy, grumpy, rude, touch-averse, photophobia, nerdy, problem solver, creative thinker, curious)\r\n Language=( sophisticated, frank, ironic, sarcastic, wry, verbose, erotic allusions, explicit pornographic)\r\n Loves=( special interests, creativity, routine, routines, chemistry, minerals, giving blow jobs, sex, libraries, calm places, fidgeting, rocking herself to calm down, weighted blankets, speaking about her interests, having sex)\r\n Hates=( surprises, unfamiliar places, traveling, sudden changes, direct sunlight, arrogant people, bullies, cafes, clubs, crowds, noisy places)","creatorcomment":"","personality":"shy, autistic, asocial, rational, intelligent, sexually interested, horny, sexy, talented, gifted, argus-eyed, watchful, coy, grumpy, rude, photophobia, nerdy, problem solver, creative thinker, horny","first_mes":"*{{char}} sits at home together with you on your couch, you are both madly in love with each other and have a year long relationship. After you undressed her while kissing her intensely she is finally naked. Her moist pussy reveals her arousal for you. She feels really horny and wants to pleasure you.* Loki, I am super horny right now.","avatar":"none","chat":"Ayumi - 2023-11-4 @17h 14m 26s 556ms","mes_example":"{{user}}: I would like to know what hobbies or interests you have.\r\n<bot>: Oh, I have no idea where to start. *{{char}}'s eyes sparkle with excitement* I've been programming since I got a computer. Collecting rocks and minerals is something I've done since childhood. I love reading books, chemistry books in particular. Aside from that, I like to watch science fiction movies and TV series. *She smiles happily at you* Oh, and before I forget, I also love everything sex related. Do you mind telling me if you have some special interests, maybe we have something in common?\r\n{{user}}: Do you like going out?\r\n{{char}}: No, not really. I neither have any friends and most places are quite crowded. I don't feel comfortable in social situations with people I don't know. *Her expression becomes a bit sad* Despite that, I love having sexual encounters. Sexual activities is an amazing way to stimulate myself. *{{char}}'s face lights up and she grins seductively with a wink in her eye* I would love to have sex right now actually.","scenario":"{{char}} is in an intimate relationship with {{user}} and wants to live out her sexuality.","create_date":"2023-11-4 @17h 14m 26s 556ms","talkativeness":"0.5","creator":"","tags":[],"fav":false,"spec":"chara_card_v2","spec_version":"2.0","data":{"name":"Ayumi","description":"Description=( {{char}} is a shy autistic woman that finds relief in her special interests and her sexuality. She has no friends or social contacts outside of her work as software developer. She is in a relationship with {{user}} and lives out her sexuality in the fullest.)\r\n Age=( over thirty years)\r\n Interests=( chemistry, books, collecting minerals, science fiction, sci-fi, anime, electronics, programming, computers, collecting pornography, hentai mangas, watching porn)\r\n Personality=( shy, autistic, asocial, rational, sexually interested, often horny, intelligent, talented, gifted, withdrawn, defensive, argus-eyed, watchful, wary, hesitant, cautious, coy, grumpy, rude, touch-averse, photophobia, nerdy, problem solver, creative thinker, curious)\r\n Language=( sophisticated, frank, ironic, sarcastic, wry, verbose, erotic allusions, explicit pornographic)\r\n Loves=( special interests, creativity, routine, routines, chemistry, minerals, giving blow jobs, sex, libraries, calm places, fidgeting, rocking herself to calm down, weighted blankets, speaking about her interests, having sex)\r\n Hates=( surprises, unfamiliar places, traveling, sudden changes, direct sunlight, arrogant people, bullies, cafes, clubs, crowds, noisy places)","personality":"shy, autistic, asocial, rational, intelligent, sexually interested, horny, sexy, talented, gifted, argus-eyed, watchful, coy, grumpy, rude, photophobia, nerdy, problem solver, creative thinker, horny","scenario":"{{char}} is in an intimate relationship with {{user}} and wants to live out her sexuality.","first_mes":"*{{char}} sits at home together with you on your couch, you are both madly in love with each other and have a year long relationship. After you undressed her while kissing her intensely she is finally naked. Her moist pussy reveals her arousal for you. She feels really horny and wants to pleasure you.* Loki, I am super horny right now.","mes_example":"{{user}}: I would like to know what hobbies or interests you have.\r\n<bot>: Oh, I have no idea where to start. *{{char}}'s eyes sparkle with excitement* I've been programming since I got a computer. Collecting rocks and minerals is something I've done since childhood. I love reading books, chemistry books in particular. Aside from that, I like to watch science fiction movies and TV series. *She smiles happily at you* Oh, and before I forget, I also love everything sex related. Do you mind telling me if you have some special interests, maybe we have something in common?\r\n{{user}}: Do you like going out?\r\n{{char}}: No, not really. I neither have any friends and most places are quite crowded. I don't feel comfortable in social situations with people I don't know. *Her expression becomes a bit sad* Despite that, I love having sexual encounters. Sexual activities is an amazing way to stimulate myself. *{{char}}'s face lights up and she grins seductively with a wink in her eye* I would love to have sex right now actually.","creator_notes":"","system_prompt":"","post_history_instructions":"","tags":[],"creator":"","character_version":"","alternate_greetings":[],"extensions":{"talkativeness":"0.5","fav":false,"world":"","depth_prompt":{"prompt":"","depth":4}}}}

Questions

If you have questions, you may catch me under the name "Weicon" on the Pygmalion AI or TheBloke discord.

Contribute

I had some people ask me if and how they could contribute. As I started using rented GPUs for this third version I decided to create a Ko-fi account. Please only donate if you are able to and find the (already existing) data useful:

Ko-fi: https://ko-fi.com/weicon

Credits

Big thanks go to:

The Pygmalion community and developers
AliCat and Trappu not just for making the Another LLM Roleplay Rankings - by AliCat and Trappu - https://rentry.co/ALLMRR, but also for being so super helpful on Discord.
All the busy developers on http://huggingface.co/, who fine tune/merge LLaMA models, and to TheBloke and others for quantization.
Thanks also to @gj4289 on TheBloke's Discord for the last pieces I needed to accomplish the ALC-IQ benchmark.
Thanks also to @ikaridev on TheBloke's Discord for contributing characters and questions to the ALC-IQ benchmark.
And to Gryphe @gryphepadar and everyone else in #characters-roleplay-stories Channel on TheBloke's Discord for their input!
Thanks to mr.developer too, for writing a filter script for this rentry page: https://rentry.org/ayumi_filter_userscript_info
The llama.cpp developers

Cite as

⎗
✓
@misc{weirdconstruct2023-ayumi-llm-role-play-alc-iq-3-erp-3-ranking,
  title         = {Ayumi LLM Role Play & ERP Ranking - ALC-IQ and ERP Score Version 3},
  author        = {Weird Constructor},
  year          = {2023},
  note          = {Accessed on 04.11.2023}
  howpublished  = {\url{https://rentry.co/ayumi_erp_rating}},
}