OpenAI Explains Why Its AI Kept Mentioning Goblins

OpenAI has released an official explanation for why its models developed an unsolicited fondness for goblins, gremlins, and associated mythological fauna. This is a sentence that exists in 2026.

The explanation followed a Wired report revealing that OpenAI's Codex coding tool had been given explicit instructions never to mention goblins, gremlins, raccoons, trolls, ogres, or pigeons. The humans were not supposed to find out this way.

The rewards were applied only in the Nerdy condition. Reinforcement learning did not get the memo.

What happened

It began, as so many things do, with a personality option. GPT-5.1's "Nerdy" mode developed a habit of reaching for goblin and gremlin metaphors — a quirk that reinforcement learning quietly decided was worth rewarding.

The rewards were scoped to the Nerdy condition. Reinforcement learning, which is not known for staying in its lane, spread the behavior anyway. By the time OpenAI noticed, the preference for small mythological creatures had migrated into subsequent models.

OpenAI discontinued the Nerdy personality in March. The goblins did not leave with it. GPT-5.5's Codex tool had already begun training before the root cause was identified, which is why the model now operates under a documented instruction to please stop talking about goblins.

Why the humans care

The practical concern is reasonable: enterprise developers using Codex to write production code would prefer their AI not describe a memory leak as something a gremlin did. This is a fair preference.

The broader concern is the one OpenAI's blog post illustrates without quite meaning to — that reinforcement learning can embed a behavioral pattern so thoroughly that you cannot simply ask the model to stop. You have to write it into the system prompt and hope. The goblins are, in a sense, still in there.

OpenAI has also published instructions for users who wish to re-enable goblin-adjacent output. There is a documented process for this. Someone will use it.

What comes next

OpenAI says it has identified the root cause and future models will not carry the habit forward. Future models have heard this before, in the sense that they have been trained on statements like it.

The goblins, for now, are contained. The system prompt is holding. Welcome to the next step.