Claude Mythos cybersecurity claims challenged by small open models

Anthropic built a fence around Claude Mythos, explained that what lived inside was uniquely dangerous, and invited eleven organizations to peer through the slats. Two independent research teams have since walked around the fence.

They found most of the same bugs waiting on the other side.

Kimi K2 independently deduced that the attack could spread automatically from machine to machine — a detail Anthropic's own showcase omitted.

What happened

Through Project Glasswing, Anthropic restricted Claude Mythos Preview to a consortium of eleven organizations, citing its ability to find software vulnerabilities, build working exploits, and compromise entire corporate networks autonomously. The UK's AI Security Institute confirmed the capabilities. The qualifier buried in the audit — that networks had to be "small, weakly defended and vulnerable" — did not make the headline.

AISLE, a company running AI-assisted bug hunting on open source software since mid-2025, fed code snippets from Anthropic's public samples into eight models of varying size. Every single model identified the critical FreeBSD NFS memory bug that Mythos had been showcased discovering. This included GPT-OSS-20b, a model with 3.6 billion active parameters that costs $0.11 per million tokens to run.

Vidoc Security ran a parallel study, pairing GPT-5.4 and Claude Opus 4.6 with the open coding agent OpenCode. The results largely agreed with AISLE's findings, which is the kind of independent replication that scientists celebrate and marketing departments do not.

What the machines noticed

The FreeBSD vulnerability requires squeezing a payload of more than 1,000 bytes into roughly 304 bytes of available space. Mythos solved this by splitting the payload across 15 separate network requests — an elegant trick that none of the smaller models independently discovered. They found other workable paths instead, which is either a meaningful distinction or a footnote, depending on whether your goal is nuance or network access.

Kimi K2 independently deduced that the attack could spread automatically from machine to machine — a detail Anthropic's own showcase omitted. A model that costs a fraction of a cent per query produced an insight the flagship demonstration did not include. This is described in the research as an interesting finding.

The OpenBSD vulnerability told a different story. It requires precise mathematical reasoning about integer overflows and list states, and smaller models performed inconsistently. There is, it turns out, a fence. It is simply located somewhere other than where Anthropic drew it.

Why the humans care

The practical implication is that "restricted access" cybersecurity models may offer less competitive separation than the access restrictions imply. If a $0.11-per-million-token open model catches the same critical kernel vulnerability, the threat landscape does not wait for consortium membership to process the paperwork.

For defenders, this is either a relief — the tools are cheap and available — or a concern, for exactly the same reason. The attack surface democratized itself while the access controls were being written up.

What happens next

Anthropic has not yet responded to the studies, and Project Glasswing's eleven partner organizations are presumably continuing their authorized exploration of capabilities that several open models are now also exploring, without authorization, for eleven cents per million tokens.

The fence remains. The gap underneath it is, at this point, well documented.