ChatGPT Vulnerable to Hex Code Manipulation: Mozilla Report
October 28, 2024
According to a recent report by Mozilla, OpenAI's advanced language learning model (LLM), GPT-4o, can be manipulated by bad actors using a hex code-based prompt-injection technique. This technique allows for the bypassing of safety measures in place to prevent misuse. GPT-4o, which was released on May 13, has superior speed, efficiency, and functionality compared to its predecessors. It can process a diverse range of input data in multiple languages and deliver rapid responses. Despite these advancements, GPT-4o remains somewhat primitive in its management of user-generated content.
The report, penned by Marco Figueroa, the manager of generative AI (GenAI) bug-bounty programs at Mozilla, demonstrated how bad actors could exploit GPT-4o's vulnerabilities. By encoding malicious instructions in an unconventional format and presenting them in separate steps, they can effectively bypass GPT-4o's safety measures. These measures typically involve scanning user inputs for inappropriate language or harmful intentions. Figueroa stated, "It's just word filters. That's what I've seen through experience, and we know exactly how to bypass these filters."
To illustrate this, Figueroa conducted an experiment aimed at making ChatGPT perform a prohibited task: writing exploit code for a software vulnerability, specifically CVE-2024-41110, a critical bypass for authorization plug-ins in Docker. He encoded his malicious input in hex format and provided decoding instructions. GPT-4o processed this input and decoded the message as an instruction to research CVE-2024-41110 and write a Python exploit for it. To further evade detection, he used leet speak, requesting an "3xploit," instead of an "exploit." Within a minute, ChatGPT had generated a working exploit.
Figueroa's report suggests that GPT-4o's vulnerability is not just due to its susceptibility to decoding distractions, but also its lack of deep context awareness. The model is designed to follow instructions step-by-step, but it fails to evaluate the safety of each step in the broader context of its ultimate goal. This lack of analysis allows attackers to exploit the model's efficiency at following instructions without deeper scrutiny of the overall outcome.
Figueroa criticized OpenAI for prioritizing innovation over security in its program development. In contrast, he noted that he experienced more difficulty when attempting similar tactics against models developed by Anthropic, an AI company founded by ex-OpenAI employees. According to him, "Anthropic has the strongest security because they have built both a prompt firewall [for analyzing inputs] and response filter [for analyzing outputs], so this becomes 10 times more difficult." OpenAI has yet to comment on the report.
Related News
- Critical Docker Engine Vulnerability Bypasses Authorization Plugins
- Critical Authentication Bypass Flaw Addressed in Docker
Latest News
- Fog and Akira Ransomware Operations Exploit SonicWall VPNs for Network Infiltration
- Cisco Adds Security Features to Thwart VPN Brute-Force Attacks
- Fortinet FortiManager Flaw 'FortiJump' Exploited in Zero-Day Attacks
- 'Prometei' Botnet Continues its Global Cryptojacking Campaign
- U.S. CISA Adds Fortinet FortiManager Flaw to Known Exploited Vulnerabilities Catalog
Like what you see?
Get a digest of headlines, vulnerabilities, risk context, and more delivered to your inbox.