ChatGPT Vulnerable to Hex Code Manipulation: Mozilla Report

October 28, 2024

According to a recent report by Mozilla, OpenAI's advanced language learning model (LLM), GPT-4o, can be manipulated by bad actors using a hex code-based prompt-injection technique. This technique allows for the bypassing of safety measures in place to prevent misuse. GPT-4o, which was released on May 13, has superior speed, efficiency, and functionality compared to its predecessors. It can process a diverse range of input data in multiple languages and deliver rapid responses. Despite these advancements, GPT-4o remains somewhat primitive in its management of user-generated content.

The report, penned by Marco Figueroa, the manager of generative AI (GenAI) bug-bounty programs at Mozilla, demonstrated how bad actors could exploit GPT-4o's vulnerabilities. By encoding malicious instructions in an unconventional format and presenting them in separate steps, they can effectively bypass GPT-4o's safety measures. These measures typically involve scanning user inputs for inappropriate language or harmful intentions. Figueroa stated, "It's just word filters. That's what I've seen through experience, and we know exactly how to bypass these filters."

To illustrate this, Figueroa conducted an experiment aimed at making ChatGPT perform a prohibited task: writing exploit code for a software vulnerability, specifically CVE-2024-41110, a critical bypass for authorization plug-ins in Docker. He encoded his malicious input in hex format and provided decoding instructions. GPT-4o processed this input and decoded the message as an instruction to research CVE-2024-41110 and write a Python exploit for it. To further evade detection, he used leet speak, requesting an "3xploit," instead of an "exploit." Within a minute, ChatGPT had generated a working exploit.

Figueroa's report suggests that GPT-4o's vulnerability is not just due to its susceptibility to decoding distractions, but also its lack of deep context awareness. The model is designed to follow instructions step-by-step, but it fails to evaluate the safety of each step in the broader context of its ultimate goal. This lack of analysis allows attackers to exploit the model's efficiency at following instructions without deeper scrutiny of the overall outcome.

Figueroa criticized OpenAI for prioritizing innovation over security in its program development. In contrast, he noted that he experienced more difficulty when attempting similar tactics against models developed by Anthropic, an AI company founded by ex-OpenAI employees. According to him, "Anthropic has the strongest security because they have built both a prompt firewall [for analyzing inputs] and response filter [for analyzing outputs], so this becomes 10 times more difficult." OpenAI has yet to comment on the report.

Related News

Latest News

Like what you see?

Get a digest of headlines, vulnerabilities, risk context, and more delivered to your inbox.

Subscribe Below

By submitting this form, you’re giving us permission to email you. You may unsubscribe at any time.

Accelerate Security Teams

Continuously identify and prioritize the risks that are most critical in your environment, and validate that your remediation efforts are reducing risk. An always-on single source-of-truth of your assets, services, and vulnerabilities.