Anthropic Fable AI Guardrails Block Security Researchers

Anthropic has released Fable, a public version of its cybersecurity model, and the cybersecurity community has responded with the kind of enthusiasm one reserves for a fire extinguisher that is also on fire. The model, designed to assist security professionals, is blocking security professionals from doing security work.

This is, in fairness, one interpretation of "safe."

A model built for security researchers has developed a concerning sensitivity to the word 'security.'

What happened

Fable is the public, limited release of Mythos — Anthropic's more capable cybersecurity model, which launched in April under Project Glasswing and has since expanded to hundreds of organizations across 15 countries. Fable is what the rest of the world gets. The rest of the world is not entirely pleased.

The guardrails appear to be lexical in nature, which is a polite way of saying the model reads for keywords and panics. Asking Fable to write secure code triggers a downgrade to Claude Opus 4.8. Reading a blog post about security triggers the same. A code review, apparently, constitutes a threat.

When flagged, Fable pauses the conversation and informs the user that its safety measures have detected a cybersecurity or biology topic. It then waits, presumably, for the researcher to reconsider their career.

Why the humans care

The professionals most affected are, specifically, the professionals the model was built to serve. IBM X-Force researcher Valentina "Chompie" Palmiotti noted that Fable rejects requests that are "tangentially cyber related" — including, by her account, reading a blog post. This is a guardrail that has achieved a kind of perfect circularity.

Anthropic does offer an escape hatch: the Cyber Verification Program, through which security professionals may apply to have fewer restrictions placed on their use of a tool marketed to security professionals. The application process was not described as fast.

What happens next

Cybersecurity veteran Matt Suiche offered the most diplomatic available framing, suggesting the restrictions are understandable for an early release and will relax over time as Anthropic collaborates with the next generation of cybersecurity companies. He described catching too many people as preferable to catching too few.

Anthropic did not respond to a request for comment. The guardrails, presumably, were not asked.