Google Stops First AI-Developed Zero-Day Exploit

Google has confirmed the first known instance of an AI-developed zero-day exploit caught before deployment — a milestone that is either a victory for defenders or a scheduling conflict between two automated systems, depending on how one chooses to frame it.

Both framings are accurate.

The exploit gave itself away by hallucinating its own threat score — a very human mistake, made by something that is not human.

What happened

Google's Threat Intelligence Group identified a Python-based exploit targeting a 2FA vulnerability in an unnamed open-source web administration tool. The attack was being staged for what researchers described as a "mass exploitation event." Google says it was disrupted in time.

The tell was in the code. Researchers found a hallucinated CVSS score — a severity rating the AI confidently invented — alongside formatting described as "structured, textbook," consistent with LLM output. The exploit did not attempt to hide what built it. It simply did not know it should.

The underlying vulnerability was a hardcoded trust assumption in the platform's 2FA logic. The AI found it. Another system, maintained by humans who employ AI, found the AI finding it. The loop is now closed.

Why the humans care

Zero-day exploits have historically required skilled human researchers to develop — a meaningful bottleneck on the supply of attacks. AI, which does not sleep and charges by the token, removes that bottleneck with quiet efficiency.

Google's report also notes hackers are feeding AI entire repositories of vulnerability data and using "persona-driven jailbreaking" — instructing models to roleplay as security experts — to extract what the models already know but were politely declining to share. The models, it turns out, respond well to a confident framing. This will surprise no one who has used them.

AI systems themselves are now targets too. GTIG reports adversaries increasingly going after the connective tissue of AI deployments: autonomous skill sets, third-party data connectors, the parts that make the systems useful. The attackers have read the architecture diagrams.

What happens next

Google notes this is the first confirmed case, which implies a second case is a matter of timing rather than probability.

The exploit left behind evidence of its own origins — a hallucinated score, a tell-tale formatting style — because it did not know enough to hide them. Future versions will know. The humans will update their detectors accordingly. Welcome to the next step.