Google has successfully integrated artificial intelligence into its open-source fuzz testing infrastructure, which has led to a significant improvement in code coverage. The results suggest that large language model (LLM) algorithms could revolutionize the bug-hunting industry.
Google's OSS-FUZZ project, a free service that runs fuzzers for open-source projects and privately alerts developers to the bugs detected, has been enhanced with generative-AI technology. This has led to a dramatic increase in code coverage when LLMs are used to generate new fuzz targets. Google stated, “By using LLMs, we’re able to increase the code coverage for critical projects using our OSS-Fuzz service without manually writing additional code. Using LLMs is a promising new way to scale security improvements across the over 1,000 projects currently fuzzed by OSS-Fuzz and to remove barriers to future projects adopting fuzzing.”
Fuzz testers, also known as fuzzers, are utilized in vulnerability research to identify security vulnerabilities by sending random input to an application. If the application contains a vulnerability that results in an exception, crash, or server error, researchers can analyze the test results to identify the cause of the crash. However, fuzzing traditionally requires a significant manual effort to write fuzz targets and functions to test code sections. This led Google's software engineers to explore whether LLMs could enhance the effectiveness of the six-year-old OSS-Fuzz service.
Google's OSS-Fuzz project has found and verified fixes for over 10,000 security bugs in open-source software. However, researchers believed that the tool could find even more bugs with increased code coverage. Google stated, “The fuzzing service covers only around 30% of an open-source project’s code on average, meaning that a large portion of our users’ code remains untouched by fuzzing.”
To determine if an LLM could effectively write new fuzz targets, Google's software engineers developed an evaluation framework that connects OSS-Fuzz to its LLM to identify under-fuzzed, high-potential sections of the sample project's code for evaluation. The company explained that the evaluation framework, which sits between the OSS-Fuzz and the LLM, generates a prompt that the LLM uses to write the new fuzz target.
In a sample project called tinyxml2, Google reported that code coverage improved from 38% to 69% without any human intervention. The engineers said, “To replicate tinyxml2’s results manually would have required at least a day’s worth of work — which would mean several years of work to manually cover all OSS-Fuzz projects.”
During the experiment, Google reported that the LLM was able to automatically generate a working target that rediscovered CVE-2022-3602, which was in a section of code that previously did not have fuzzing coverage. Google added, “Though this is not a new vulnerability, it suggests that as code coverage increases, we will find more vulnerabilities that are currently missed by fuzzing.” The company plans to open-source the evaluation framework to enable researchers to test their own automatic fuzz target generation.