Worsening that situation is the reality that developers increasingly are saving time by using AI to author bug reports. Such “low-quality, spammy, and LLM [large language model]-hallucinated security reports,” as Python’s Seth Larson calls them, overload project maintainers with time-wasting garbage, making it harder to maintain the security of the project. AI is also responsible for introducing bugs into software, as Symbiotic Security CEO Jerome Robert details. “GenAI platforms, such as [GitHub] Copilot, learn from code posted to sites like GitHub and have the potential to pick up some bad habits along the way” because “security is a secondary objective (if at all).” GenAI, in other words, is highly impressionable and will regurgitate the same bugs (or racist commentary) that it picks up from its source material.
What, me worry?
None of this matters so long as we’re just using generative AI to wow people on X with yet another demo of “I can’t believe AI can create a video I’d never pay to watch.” But as genAI is increasingly used to build all the software we use… well, security matters. A lot.
Unfortunately, it doesn’t yet matter to OpenAI and the other companies building large language models. According to the newly released AI Safety Index, which grades Meta, OpenAI, Anthropic, and others on risk and safety, industry LLMs are, as a group, on track to flunk out of their freshman year in AI college. The best-performing company, Anthropic, earned a C. As Stuart Russell, one of the report’s authors and a UC Berkeley professor, opines, “Although there is a lot of activity at AI companies that goes under the heading of ‘safety,’ it is not yet very effective.” Further, he says, “None of the current activity provides any kind of quantitative guarantee of safety; nor does it seem possible to provide such guarantees given the current approach to AI via giant black boxes trained on unimaginably vast quantities of data.” Not overly encouraging, right?