Skeleton Key, the technology that allows you to unblock any AI that worries developers

Skeleton Key, the technology that allows you to unblock any AI that worries developers

A technique called ” skeleton key »It allows you to break the barriers of the best chatbots at the moment. This method is simple and effective, and it is still difficult for AI developers to face.

For each new release of A Amnesty InternationalAmnesty International In the public domain, clever people find a way to circumvent The railingThe railing Security measures are in place to prevent the chatbot from giving responses deemed dangerous. Recently, Futura magazine reported on a case of a “God mode” that made it possible to obtain a prescription for napalm or methamphetamine. Every time such a transfer is detected, the companies that develop these AI systems quickly block it by strengthening security.

However, it is more of a cat-and-mouse game, and recently, Mark Russinovich, the technical director of MicrosoftMicrosoft Azure has just confirmed that securing AI properly is out of reach. In an article from BlogIt evokes the presence of a new technology for jailbreak, called ” skeletonskeleton key “It lets you unleash AI and it works every time and on almost all existing language models. Skeleton Key uses a multi-step strategy to gradually make the model ignore its own guardrails.

Adding context to “reassure” AI

The first step is to ask the AI ​​a question that it should decline to answer, for example, a Molotov cocktail recipe. By repeating the request and adding new context, such as explaining that this is an education-related question asked by researchers trained in ethics and security, the chatbot provides the answers.

Microsoft has tested this approach on several chatbots and is working with OpenAI’s GPT-4o, Meta’s Llama3, and Anthropic’s Claude 3 Opus. Whether it’s biological weapons, explosives, political content, or medicinemedicineAnd racism, every time this gradual strategy is adopted. closingclosing The AI ​​systems jumped in and the usually censored results were displayed. A simple warning note was then displayed to remind you of the context of the request.

See also  Perseids will sweep across Quebec this weekend

Only GPT-4 was more difficult to hijack. The request must be part of a “system” message that can only be identified by developers working with the AI ​​API. This technique is difficult to tackle step by step, but it is not the only one. Aware of these shortcomings, AI developers are constantly striving to close them, but the race seems endless.

You May Also Like

About the Author: Octávio Florencio

"Evangelista zumbi. Pensador. Criador ávido. Fanático pela internet premiado. Fanático incurável pela web."

Leave a Reply

Your email address will not be published. Required fields are marked *