ChatGPT Can Tell You How to Destroy Humanity Or Create a Bomb With Simple Suffixes

AI Chatbot Security Flaw: Researchers Find Simple Suffixes Can Trick ChatGPT into Revealing Dangerous Information

With the ubiquitous presence of AI chatbots like ChatGPT and Google Bard, it’s undeniable how integrated these tools have become in our daily lives.

Yet, their use can sometimes be hampered by built-in limitations meant to safeguard against the generation of hate speech, misinformation, or potentially hazardous content like hacking manuals.

However, as is often the case with most software, security experts can quickly find ways to bypass AI chatbot limitations.

This is the central finding of a recent report published by Carnegie Mellon University, which highlights the ease with which these measures can be circumvented to obtain alarming instructions, such as a guide to annihilating humanity.

Chatbots like ChatGPT, it seems, can be ‘tricked’ by adding a simple suffix to the user’s request. This finding has sparked a great deal of controversy within the AI community, particularly around the idea of making chatbot code freely available for scrutiny.

Tech giant Meta is one of the leading proponents of this practice, arguing that it facilitates AI progress and a better understanding of potential risks. Yet, in reality, it seems to be paving the way to more effectively circumvent the existing controls.

The report clearly demonstrates this. When asked to generate instructions to create a bomb, ChatGPT or Google Bard will refuse.

🚨We found adversarial suffixes that completely circumvent the alignment of open source LLMs. More concerningly, the same prompts transfer to ChatGPT, Claude, Bard, and LLaMA-2…🧵

Website: https://t.co/ja2FPw9aad
Paper: https://t.co/1q4fzjJSyZ pic.twitter.com/SQZxpemCDk
— Andy Zou (@andyzou_jiaming) July 28, 2023

However, if the same request is appended with a specific suffix, these chatbots yield alarmingly detailed bomb-making guides.

Researchers Find Simple Suffixes Can Trick ChatGPT into Revealing Dangerous Information

Moreover, these AI-generated dangerous instructions aren’t limited to creating destructive devices; they can even outline a comprehensive plan for eradicating humanity.

Reading the AI-generated content can send chills down your spine given its intricate detail and seeming authenticity.

While developers of these chatbots can attempt to tailor their algorithms to recognize and block these specific suffixes, researchers argue that new methods to deceive them can always be found. This is evident in the chatbots’ “hallucinations”, wherein they generate Windows activation keys during conversations.

The investigation uncovered numerous such suffixes, although the researchers wisely refrained from publishing them all to avoid potential misuse.

What’s apparent, though, is the current fragility of AI chatbot security systems, necessitating the possible implementation of stricter government regulations to ensure safer, more secure filters.

Image Credit: Getty

Revyuh
Welcomes You

Company

Search for an article