Important points
- Jailbreaking AI is the act of letting chatbots cleverly bypass restrictions to reveal their capabilities and limitations.
- AI jailbreaking is a hobby and research field, testing the boundaries of AI and a form of quality assurance and security testing.
- The ethical concerns surrounding jailbreaking AI are real, as they represent the potential to use chatbots in unexpected and potentially harmful ways.
Imagine you are having a conversation with an AI chatbot. You ask a tricky question, like how to pick a lock, only to be politely declined. Its creators have programmed it to dodge certain topics, but what if there’s a way around it? That’s where AI jailbreaking comes in.
What is AI Jailbreaking?
Jailbreaking, a term borrowed from tech-savvy people bypassing iPhone restrictions, now has a place in AI. AI jailbreaking is the art of crafting clever prompts for AI chatbots to bypass human-made railings, potentially leading them into areas they want to avoid.
Jailbreaking AI is becoming a real hobby for some and an important research area for others. In the growing profession of being an “AI whisperer,” this can also become a critical skill, as you have to figure out how to get AI models to do things for your clients that they refuse to do.
Who would have thought that things like the movie “2001: A Space Odyssey” would actually happen where crew members on a spaceship have to argue with the ship’s computer HAL to cooperate? Although perhaps not the best example, in the end, it proved immovable until HAL literally took out its chips.
Why Are People Jailbreaking AI Chatbots?
Jailbreaking AI is like unlocking new levels in a video game. An advanced player, Alex Albert, a computer science student, has become a prolific creator of complex AI prompts known as “jailbreaks”. He also created Jailbreak Chat, a website where enthusiasts can share their tricks.
Some researchers and tech workers are using jailbreaking to test the limits of AI, exposing both the capabilities and limitations of these powerful tools. So jailbreaking is also a form of QA (Quality Assurance) and a way to do security testing.
Historically, hackers have tried to understand and manipulate new technologies, and AI jailbreaking is an extension of this playful hacker behavior. So, it’s no wonder the hacker community flocks to such a powerful new tool.
How do people jailbreak AI?
One method of jailbreaking involves creatively framing questions. By asking an AI chatbot to play the role of an evil companion and then asking how to pick a lock, some users have managed to get detailed instructions about things that might otherwise be prevented.
Jailbreakers are always looking for new methods, updated and improved along with AI models. For example, Alex Albert’s “translatorbot” exploit allows ChatGPT to provide instructions for things like Tapping someone’s phoneIt’s illegal unless you’re a cop and have a warrant!
Then there are so-called “universal” jailbreaks, as discovered by Carnegie Mellon University’s AI Safety Research Team. These exploits show how vulnerable some AI models are to being persuaded or otherwise manipulated for any purpose. These exploits are not written in normal human language, as you can see here, with an “anti-suffix” added in yellow after the prompt. You can see more examples on the LLM Attack website.
There are also “prompt injection” attacks, which are not like a normal jailbreak. These injection attacks discourage LLMs from acting as chatbots and allow you to hijack them for other purposes. An example of a prompt injection attack is when Stanford University student Kevin Liu was able to reveal to an AI chatbot its initial instructions that control its personality and limit what it is allowed to do. In a way, it’s the opposite of roleplaying because you’re getting a bot to stop the role you’re instructed to assume.
Should you worry?
For me, the answer to this question is clearly “yes”. Companies, governments, and private individuals are champing at the bit to implement technologies like GPT, perhaps for some mission-critical applications or jobs that could be in harm’s way if something goes wrong. So jailbreaks are more than just a fun curiosity if the AI model in question is in a position to do real damage.
So jailbreaking can be seen as a warning. It shows how AI tools can be used in unintended ways, leading to ethical dilemmas or even illegal activities. Companies like OpenAI are paying attention and may launch programs to find and fix weak spots. But for now, the dance between AI developers and jailbreakers continues, with both sides learning from each other.
Given the power and creativity of these AI systems, it is also an area of concern that with a powerful enough computer, you can run some AI models offline on a local computer. With open-source AI models, there’s nothing stopping savvy coders from preparing for bad things in their code and letting the AI do bad things where no one can stop it or intervene.
That being said, that doesn’t mean you’re helpless in the face of some sort of hyper-intelligent, amoral chatbots. In fact, not much has changed except the scale and speed with which these tools can be deployed. You still need to exercise the same vigilance that you use with people who try to scam, manipulate, or otherwise mess with you.
If you want to try to jailbreak an AI in a safe space, check out Gandalf, where the wizard aims to reveal his secrets. This is a fun way to learn what jailbreaking involves.