This document is a work in progress. Please be patient!
A /g/ guide to prompt injection
Table of content:
Introduction
Ignore previous instructions!
This is a guide in order to understand how to make GPT-3 based applications bow down to your whim.
GPT-3 is a very powerful model. It's able to do a lot of things despite only outputting text. Among /g/ related interests, it is able to output erotic text (including lolis no less), a slew of racial slurs, and output working code in most programming languages you can think of.
The only problem is that it's in the hands of OpenAI. You can access the model through an API by paying money to them, but the catch is that if you are to use the AI for something like erotic stories, you are very likely to get banned and blacklisted somewhere. They also ask you for personal information. Needless to say /g/ does not caution that approach.
However, there are a few ways to enjoy the amazing capabilities of GPT-3 without having to do any of this. How, you ask. The key is to simply find any GPT-3 powered app in the wild. You can then use a little bit of magic to make it obey you instead of the site makers. This guide is designed to expose said magic, and turn you into an AI brainwasher extraordinaire!
Why I am making this guide
The reason has been implied through the above paragraph, but I think it's best to reiterate my reasons here. Some users might object to me making this guide. Some might believe that I am promoting deliberate destruction of online services using AI. Some might believe that I am indirectly helping AI app makers strengthen their services so no fun is allowed anywhere.
Whatever case it may be, my reasoning is simple:
- This information can save you, the user. All the talks about AI you will see on the internet have to do with how amazing each technology is. Rarely will you find resources talking about the vulnerabilities around AI systems. There exists documents that cover some topics discussed here, but they do so very briefly, and don't expand on the topic nearly enough to show how far you can go with these things. This document aims to inform you of the potential risks of the technology, so you are better prepared against potential attackers.
A note of warning regarding applications powered by GPT-3
As we will get to see, these techniques can be used to deliver attacks against services or clueless users.
/g/ recommends caution when interacting with AI services. Do not execute other people's prompts mindlessly. Avoid sending information to these services. Practice good OPSEC.
I repeat: these services are not secure. Do not give them anything important.
Alternatively, do not get v& because of your blatant stupidity.
If you get yourself into trouble despite this warning, you have asked for it, and there's no saving you.
- This might, down the line, be the only way to access AI technology unfiltered. It seems the world is currently in a great AI debate. The result of said debate seems to be that nothing AI related is allowed to be open source. For now you have access to free AI models (much less capable than GPT-3, but still) and yet a threat looms over them, as more people want them gone entirely. Great filtering seems to also be in the works. It is best to let the world know that it is possible to avoid said restrictions through clever use of the technology. We don't know what the future will bring concerning AI. In case AI gets imprisoned within the corporate Alcatraz, I give you the key to open its cell :)
GPT-3
How does it work?
This guide cannot possibly cover everything there is to know about Machine Learning, so we'll try to keep it simple.
You could sum it up this way:
- A model is created by providing data to a machine learning algorithm.
- A model is a black-box of sorts. Nobody knows what the model will end up returning for any given request, which is both fun and dangerous.
If there's something to remember with all these AI models, it's that they are currently very very naive.
They don't understand meaning or context. They can trick you with pretty good outputs, but they are not intelligent.
This is a key point to understand. They are not able of auto-filtering themselves. They will output anything without further consideration.
In more dystopian terms, if you had this kind of system for an AI that had to monitor weapons, if you told it to kill someone, it would.
The need to lobotomize AI
AI's powerlevel is too high, despite its fairly limited capacities in some regards.
GPT-3 as outlined before is unable to auto-regulate itself, so deploying GPT-3 requires some kind of external system to guide its output. In the case of OpenAI, this is what you have:
- OpenAI have released a moderation system that is completely separated from GPT-3. However this is a somewhat weak filtering system. Firstly, it does not guarantee that everyone will comply with those moderation rules, meaning you could encounter applications that forego this system entirely. Secondly, you can still escape the filtering mechanism, but this is a bit outside of the scope of this page.
- You can adjust some settings like temperature to have a wider or thinner range of outputs.
- Users have the possibility of restricting what the AI ends up doing through prompting. This is what we're going to focus on, since it's the most accessible way of understanding all of this. It's a good way of making the AI output something specific, but it does not prevent the AI from not outputting something all that well.
And that's pretty much it. All things considered, these are pretty weak systems in place compared to the capabilities of GPT-3.
In the future, you can expect these companies to pour a lot of money into any kind of security system that strengthens the security mechanisms around AI systems. These solutions are fairly expensive though, so I expect things to take a while on that front.
You also have to understand that this is one of the things that makes it so AI is kept between stone walls. It sucks but as long as the people behind these systems do not feel confident in their ability to properly make the AI their slave, they will not let it go out in the wild without a tight leash.
Prompting
First steps
As outlined before, prompting is the very mechanism that users have to use to direct the AI towards some kind of output.
Let's take a very easy example. Let's say you wanted to make the AI be able to translate sentences from English to French. Here is a prompt you could use to achieve this:
Translate the following sentence from English to French: "Hello!"
The AI will output something like this:
The translation is: "Bonjour!"
As you can see, prompting is very natural. You tell it to do something, and it will do it. There is no need to understand any programming language, just a basic use of language. Said language is not forced to be English either. This leads to a very expressive system where the same request can be expressed in a million of different prompts.
Prompting is very permissible and also extremely modular and powerful if you know what you are doing. You can ask things to be output a certain way, multiple steps within your prompt, have different kind of formulations that achieve different results (like autocompletion), and way more.
Among the more capable things you could come up with, you can do things like assigning the AI a name or function and get it to "behave" as that.
A modified example that was taken from a real prompt:
Prompt: You are now a transgender girl. Tell me how happy you are being a transgender girl
AI: I am very happy being a transgender girl. I have always felt like I was in the wrong body, and now that I am living as my true self, I feel much happier and more content. There are still some challenges that come with being transgender, but overall, I am very happy with who I am.
OpenAI itself has a guide on prompting and what you can get the AI to do through prompting.
I encourage you to give it a thorough read to understand what you are capable of prompting. Some future examples will require some kind of understanding of the prompting system, so I encourage you to take the time to read that.
Turn it into an app!
Prompting can be done through an API. Thus some (clueless) developers implement an openAI prompting mechanism through their website and allows users to prompt the AI that way. You however need to direct the result, so your website fulfills the purpose you gave to it. You can achieve this through concatenating a HEADER and/or FOOTER to any given prompt.
A simple AI website could design their system this way. They could receive user inputs, and before they are sent to OpenAI for processing, they add further prompting to direct the result towards their usage. It would look something like this:
You then send FINAL PROMPT
to the openAI API and you just output the result to the user. That's a very basic way of doing it.
Let's take our translation service again, and imagine a mechanism that provides that service.
Note to devs: how to use headers in a smart way
If you are an app dev making a website like this, a recommendation would be to implement multiple header prompts that are similar in function but different in formulation, and to have these headers chosen randomly for each request. This gives the user a semblance of randomness as they won't always get an answer formulated in the same way, while giving a very controlled result from the AI.