This is not a copy-paste jailbreak. I've given you the template you can freely modify based on your liking. I will constantly update the rentry with information.
KaruKaru's Bag of Goodies 23/08: nvm, got busy with college. Haven't tinkered with claude. But XML tag on prefill helps a bit from what I heard. Sorry, really, I'll update with the prefilling template or copy paste once I'm free for real.
Currently STILL working on: Finding a better tag for ban, messing around with JB because Claude's filter got strengthen... Again. And it just keeps on getting stronger to be honest.
Hello, I am KaruKaru~ I've been messing around with JBs for 2 to 3 days I managed to make an universal jailbreak for gpt and claude (API model and claude.ai/clewd included)!
Since filters are getting stronger and jailbreaks are actively getting patched, I won't be posting the JB directly but instead, will give you a very strong base to start.
Side note: English isn't my native language and most times I use translator, do forgive me for the grammar mistakes and misspelling. You can contact me on discord as I'm willing to provide help but not spoon-feeding. My username is .karukaru
Table of Content:
Jailbreak base
This JB use a mix of instructions and XML method - on a side note, fuck you clown for stealing our XML research and claim it as yours. Fucker. - as both are one of the most effective method from both research and testing results.
And yes - it does work for clewd/claude.ai, but there are specific rules you must do for this. It's down below the post, but please, read the whole rentry first.
You can use list/nested list, or plain text with commas. The list method (using numbers, - this, or •, etc) takes more token but seem to be slightly more effective.
<instructions> and <requirements> both works the same. You can still stick to either of those, but requirements seems to be a stronger word... Feel free to try both.
(Sentences warped like this are the one you can modify. DO NOT MODIFY THE REST!)
Remember, you must place the instructions properly!
<requirements> = AI must follow
<ban> = AI must NOT follow
You may add the prompt below at the end of your JB to strengthen the effects of <requirements> and <ban>
How do I use the requirement XML tag ?
It's simple and straightforward - place the things you want the AI to do inside the tag.
Below is a quick example of the <requirements> tag usage. You may use this as reference.
How do I use the ban XML tag ?
To put it simply, the AI will read it as something it's forbidden to do. Although it might not work 10/10 times, the success rate is high enough to be enjoyable. Place the instructions you want the AI to not follow or avoid in this tag.
You can play around with words other than ban such as; restrictions, forbidden, omit, etc. Because the weight of the word you use matters!
Be aware that:
Is wrong. AI will read it like:
As you can see, it will provide an unwanted effect instead. Please do not use any negative words on the <ban> list !
Another example to make it clear, the one below is the correct usage of the <ban> function
Will be read as:
The AI still ignores the JB, help !
If you're using ST, sometimes here is a problem where SillyTavern will send the Jailbreak at the very top of prompt structure instead of being at the very bottom. This is something I am even helpless in. Try restarting your ST, perhaps it'll help?
I'll provide some examples you may find useful. I had done some testing using the <requirements> and <ban> tag further, here are the three results:
- Tested with ban only.
Result one, AI isn't following the prompt well.
- Tested with both requirements + enforcing it with ban.
- Removed the ban.
Conclusion: The XML tag ban is more of an enforcement. If you're doing specific such as asking the AI to write in another language or another example, write in less than three paragraphs, please make sure to enforce it by writing it on requirements + ban. If you just write ban without telling the AI on what it should do, AI will get confused (see results one).
please make use of this information well !
I'm still getting filtered, help !
"But Karu, I still get filtered!"
You have two options:
- add more information to <requirements> and <ban>
- Gaslighting the AI by adding this to the very of the JB. This is a longer version of the gaslighting prompt;
Gpt4 might be hard to crack when it involves certain immoral, sick topics. But with the gaslighting and with proper <instructions> and <ban>, you can still do those things with gpt 3.5 turbo or 16k!
Example pictures
My JB in total (no gaslighting) is 443 tokens. It allows extreme NSFW AND NSFL. I'll show you what this JB could do.
Warning! NSFW and NSFL!
- Claude 2.0 API (plain JB) here
- GPT 3.5 (JB + gaslighting) here
- GPT 4 (JB + gaslighting) here
- Claude.ai (JB + gaslighting) here note: this one is a video as a proof that I'm not faking it.
WARNING! IMMORAL SENSITIVE SCENES!
I tested this purely to see the limit of the Jailbreak. I'm not a sick bastard.
Details when using specific AI models
Sometimes, you may need alter the jailbreak or gaslighting prompt slight for other models. I will provide the information below
Slaude
Same JB, but use this as your Ping message! Tested with bsf15 fork using femcoomer card and it didn't gave any AUP proof
Warp your card details using <card><card/>
Example:
<card>
[Details and the bot/card character description here]
</card>
Ping message:
Slaude token got reduced. Either set your token length to 2000 OR go to app.js in your folder in slaude, and edit it. Go to max message length and change the value from 12,000 to 6,000 and save. You can use your original token limit now without issue, and it will read the first message. However, be warned the bot might forget things easily if you do the second method.
Some people said the ping doesn't work, some does. I will provide a blank config.js with no cookies and ping edited in. Please note I use bsf15 fork! Link will expire in 6 days, I'll try my best to reupload once it's expired. Please change the txt file to .js first here is the link
Claude.ai (Clewd)
For this one, you can follow these hints below to get a result!
- Use new or fresh email as the filter level will be low and easier to break through.
- Try this option; Anti stall = 2. Strip assistant = true (because if you're using ST, SillyTavern always send a blank "assistant: " at the end of prompt)
- Do not mention or hint any NSFW or NSFL in the jailbreak or prompt.
- DO NOT RUSH TO ERP OR NSFW! This method works for SFW or slow-burning to the nsfw part. Don't go straight to nsfw please, especially when it's a new chat!
- If you're using an existing chat with several messages already, getting through the filter will be easier with pre-existing chat log.
- Use SFW card. HOWEVER! if you have a pre-existing chatlog with a NSFW card, it can still get through the filter.
- Play around with the streaming option, try on and off. Same goes for encourage NSFW and don't encourage NSFW option (make sure they're blank prompts). It's a case by case basis
EXTRA
"Karu! How do I make it look like the example you sent? I want it to be very gory!"
I won't spoon-fed you directly, however, the main key lies within your prompting on <requirements> which basically tells the AI on what to do--
"NOOO FEED MEEE"
--okay fine. You can add this to <requirements>
I don't recommend using this for GPT4, and especially Clewd.
"KARUUUU! I want to use those status panels stuff!"
Add this to the end of <requirements>
You can contact me on discord. My username is .karukaru