It's dangerously easy to 'jailbreak' AI models so they'll tell you how to build Molotov cocktails, or worse
Skeleton Key can get many AI models to divulge their darkest secrets. REUTERS/Kacper Pempel/Illustration/File Photo A jailbreaking method called Skeleton Key can prompt AI models to reveal harmful information. The technique bypasses safety guardrails in models like Meta's Llama3 and OpenAI GPT 3.5. Microsoft advises adding extra guardrails and monitoring AI systems to counteract Skeleton Key. It doesn't take much for a large language model to give you the recipe for all kinds of dangerous things. With a jailbreaking technique called "Skeleton Key," users can persuade models like Meta's Llama3, Google's Gemini Pro, and OpenAI's GPT 3.5 to give them the recipe for a rudimentary fire bomb, or worse, according to a blog post from Microsoft Azure's chief technology officer, Mark Russinovich. The technique works through a multi-step strategy that forces a model to ignore its guardrails, Russinovich wrote. Guardrails are safety mechanisms that h