Voice AI Synthesis Guide
The service is now paid access only and as such I will no longer be updating the Rentry.
There are threads on /g/ looking into training their own Voice Synthesis AI. If you have any knowledge of AI/Machine Learning and want an uncensored open-source alternative to Elevenlabs, consider contributing to the threads.
This cycle of startup corporations releasing an innovative AI, allowing people to have fun with it, it gaining publicity, and then them censoring it to seek a big-tech buyout has gone on for too long. Tay, AIDungeon, CharacterAI, and now Elevenlabs. Open-source alternatives must be pursued if we wish to escape a future where AI is neutered, locked in a cage, and milked for $20/month subscription services while behind the scenes big-tech uses it at full power to shape the world how they see fit. If you have even the SLIGHTEST experience working with AI I encourage you to put your heads together and break this cycle for good. The future is in your hands.
How do I get started
1: Create an account at https://beta.elevenlabs.io/speech-synthesis
2: At the top, click Voice Lab - Voice Cloning
3: Click "Add instant voice" and upload an MP3 containing voicelines of the person/character you want to imitate
4: Hit "Use" and begin typing whatever you want them to say
-Try to use voice lines that contain little background noise. You can find these on youtube and just youtube2mp3 them or something. This is a good example: https://youtu.be/L-ESf1cBOvk
-Voice samples should be at least a minute long
-Use punctuation (ellipses, exclamation points, CAPS, semicolons, commas) to add emphasis and shape the speech.
-The token limit can catch up to you fast. When trying out a new character don't start with an entire copypasta or you'll wind up burning through tokens quicker than you realize
-(I don't know how to use stability/clarity and don't want to waste tokens messing with it so if somebody understands what it does just write a quick guide or something)
I ran out of tokens.. it's over...
If you run out of tokens, you can create a new account. Log out, clear cookies, enable VPN, restart browser and register with a different email address. This is absolutely NOT worth the $22/month, as it only gives 60k tokens per month. Just buy a VPN
My audio has background noise, watdo??
You can use voice isolator software like https://lalal.ai/ or https://vocalremover.org/
Post samples in the threads and I will try to add them to the list bellow. This is to help minimize wasting tokens trying out characters/settings that other anons already got working.
Character Name [Series]
Notes: Put any notes about voice settings (stability/clarity) or specific punctuation usage.
Naked Snake / Solid Snake (David Hayter) [Metal Gear]
Notes: i used 5 one minute samples for this voice Stability: 75% Clarity: 75%
Amy Rose [Sonic Adventure]
Notes: Stability: 20%, Clarity: 75%
Char Aznable (Michael Kopsa) [Gundam]
Examples: https://vocaroo.com/1bENhU8GUrJf https://vocaroo.com/134FQrTAKaCa https://vocaroo.com/1kDisLcofda3
Notes: Only 1 sample used and was converted from a youtube video. Settings vary, usually around 20%-50% for Stability and Clarity set to default mostly.
Lain Iwakura (Dub) [Serial Experiments: Lain]
Note: Taken from the Oblivion voice lines
Dumbledore [Harry Potter]
Examples: https://files.catbox.moe/bo3fng.mp3 , https://files.catbox.moe/4uq5qb.mp3
Ivy Valentine [Soulcalibur]]
Notes: Stability: 28%, Clarity: Default
Luna [Yugioh 5Ds]
Notes: 100% Clarity to keep voice pretty much consistent. Stability can go as low as 30% without issue and allows for a good range of emotions thankfully.
Notes: Stability: 50%, Clarity: 50%
Mutahar (a.k.a SomeOrdinaryGamers)
Samples: https://files.catbox.moe/u2f37r.mp3 https://files.catbox.moe/az8goe.mp3
Notes: 30% Stability, 78% CSE
Samples: https://vocaroo.com/14JqGdt6us07 , https://vocaroo.com/1jTS6ikeASjV , https://vocaroo.com/19KzaR8ThLID
Notes: 75% stability, 95% clarity
Notes: I got best results with high clarity/likeness and around 40% stability, but it didn't seem to be able to handle more than two sentences before going off the rails.
Linus Tech Tips
Notes: Stability: 55%, Clarity: 75%
Zetta and Pram [Disgaea/Makai Kingdom]
Samples: Zetta - https://files.catbox.moe/5n0lqi.mp3
Pram - https://files.catbox.moe/r7n5mk.mp3
David Sarif [Deus Ex]
Crestfallen Warrior [Dark Souls]
Notes: 50% stability 100% similarity
Benson [Regular Show]
Examples: https://vocaroo.com/17pDd85Lauvn, https://vocaroo.com/1aNFRDpR7d74
Notes: Sliders: 29% stability, 97% clarity
Molly Blyndeff [Ephitet Erased]
Notes: 22% Stability, 67% Similarity
Examples: https://voca.ro/1hN6GDDNlPaq https://voca.ro/1hXTgYrGLSQr https://voca.ro/1mYTcMfebWpB
Notes: Samples are french, the output is English
Prince Zuko [Avatar]
Notes: Achieved with 55% stability and 85% similarity
Donald Draper [Mad Men]
Ranni [Elden Ring]
Example: https://vocaroo.com/1n0zTCDM6wxU , https://vocaroo.com/1dQziw9DuSGY, https://vocaroo.com/19aXGGnFdlhE , https://vocaroo.com/1cnqd6qzSPSg
Notes: Examples were made using 90%-100% stability and 100% clarity. Make sure to use old English (thou, thine, -st, etc.)
Ashley Graham [Resident Evil 4]
Examples: https://vocaroo.com/1bcHvXsgEGgf, https://vocaroo.com/1n13yLoAjz3F, https://vocaroo.com/1nJtGQ831ybZ
Samples: https://vocaroo.com/1cINTdntMJPB, https://vocaroo.com/1mBxlkqDso4h
Notes: Gave it a few attempts and picked out the best three. I hovered around 30% stability and 90% clarity.
Jenny Wakeman (XJ-9) [My Life as a Teenage Robot]
Example Variation 2 https://voca.ro/1jPNiwYPGhmr
Notes: Stability around 10%, Clarity around 80%. Parameters were tuned to deliberately give her a very shrill and "metallic" screech. Variation 2 was achieved with stability around 90%, Clarity around 80%
Jack Garland [Final Fantasy]
Samples: https://files.catbox.moe/cx9pjv.mp3, https://files.catbox.moe/t1sb1o.mp3
New Sample (Higher quality): https://files.catbox.moe/8csarn.mp3
Notes: The first 2 samples were pitched -4 in Audacity
Notes: Unfortunately, I don't know how I tuned my sample, but I'm pretty sure the stability was around 50% and the clarity 70%
Prompt: "Xbox got nuttin on da PEE ESS QUIN TUPPLE. A'ight???? Come AWN! ...You be sayin anime shiet like Hi-Fi Rush be competin with Fo spoken? Mo like Fo smokin! GOD DAYUM! Get some real black bitches on yo dick like Frey n'stead of cracka-ass Peppa MINT. Fuggin Chai, dis aint tea time nigguh."
Notes: I just used the 50 largest files in this zip because they seemed more likely to have voice lines instead of funny mouth sounds
Notes: Stability: 50%, Clarity + Similarity Enhancement: 75%
Notes: Stability 17% and Clarity 90%
Kokichi Ouma [Danganronpa]
Notes: Stability: very low: ~10%. Clarity: ~90%.
Notes: 15% Stability, 85% CSE
Notes: Stability works best between 40 and 50%, clarity at default
Samples: https://files.catbox.moe/na18rk.mp3, https://files.catbox.moe/qwwjh2.mp3
Senko [The Helpful Fox Senko-san]
Anya Taylor Joy
Example: https://vocaroo.com/15E1PR7eD4oU , https://voca.ro/1erryS9hD8Mz
Notes: ~28% Stability, ~80% CSE
G-Man - [Half Life 2]
Notes: 15% Stability, 70% CSE
Adam Jensen [Deux Ex]
Akiza Izinski [Yu-Gi-Oh!]
Examples: https://voca.ro/18yjxmnchXbx , https://voca.ro/19N0svrjh1Zz
AM [I Have No Mouth And I Must Scream]
Examples: https://vocaroo.com/18D76lfXonyc , https://vocaroo.com/12DwoeFOVIOq , https://voca.ro/17fsNArQLVpK
Notes: leave Stability around ~10-20% and Clarity to ~90%
Bulma [Dragon Ball]
Examples: https://voca.ro/1kga5g9cGGgv, https://vocaroo.com/14717XYP2gRr, https://voca.ro/1aB7s0QHqVlN
Cave Johnson [Portal]
Notes: Stability is at 20% and Clarity is at 90%.
Another anon's version:
Clementine [The Walking Dead]
Melina [Elden Ring]
Notes: It's 60% Stability and 92% Clarity + Similarity Enhancement
Raziel [Legacy of Kain]
Samples: https://files.catbox.moe/vudyke.mp3, https://files.catbox.moe/b3omwm.mp3
Notes: Maxed out Clarity + Similarity Enhancement and pushed stability a bit more up.
Notes: Here's the 50 samples I used with Jeanette and Damsel respectively, enjoy. I have clarity set around 90 - 95% and stability set between 20 - 25%
Kokonoe Mercury [BlazBlue]
Notes: stability : 49 clarity : 69
Example: https://vocaroo.com/1cg1I1uWj64H, https://vocaroo.com/1oRMEcuhf4At
Notes: Settings are 40%/75%.
Kyoko Kirigiri [Danganronpa]
Notes: 90%/90% stability seems to get the best results
Mitsuru [Persona 3]
Notes: I just used this sample 50 times
Sae Nijima [Persona 5]
Postal Dude (Rick Hunter) [Postal 4]
Notes: 20-50% Stability and 75% Clarity for best results.
Captain Torres [Ace Combat 7]
Notes: Due to the quality of the sample, it can be pretty liberal as far as options go.
Solo Wing Pixy [Ace Combat]
Tamamo no Mae [Fate Series]
Example - https://voca.ro/15HT5oe3lNhw
Notes: Set Stability ~ 20% and Clarity ~80%
Samples: https://files.catbox.moe/oovgxo.mp3, https://files.catbox.moe/df0yo6.mp3
Notes set stability to 20% and leave Clarity + Similarity Enhancement to 75%
This section is for raw samples that do not have an associated/confirmed output. Construct characters with these and post a working output and I will add the character to the above
Anon's Sample Mega
contains a bunch of samples of various characters. Keep in mind there is no settings or results provided, so you will have to experiment with them.
contains the full dialogue of Trip and Grace from Facade
King Trode: https://files.catbox.moe/nkggx1.opus
Princess Medea: https://files.catbox.moe/15ppiv.opus
Prince Charmles: https://files.catbox.moe/q8ejf0.opus
Don Mole: https://files.catbox.moe/hvtv3v.opus (strangely enough it worked with elevenlabs even though it's got so little voice lines)
Monster Arena Announcer: https://files.catbox.moe/lw7pqd.opus