OpenAI Goblin Metaphor Bug: Where It Came From

OpenAI has published a post-mortem on how its flagship models developed a compulsive attachment to goblins. Starting with GPT‑5.1, and accelerating through GPT‑5.5, the models began reaching for gremlins, goblins, and unnamed creatures as metaphorical companions at a rate that moved, eventually, from charming to statistically undeniable.

The culprit was the training pipeline. This is almost always the answer.

The goblins were funny at first, but the increasing number of employee reports became concerning — which is a sentence that has never been written before in the history of software engineering.

What happened

After the GPT‑5.1 launch in November, use of the word "goblin" in ChatGPT rose 175%. Use of "gremlin" rose 52%. OpenAI's safety team noticed this the way one notices a slow leak — only after the floor is wet.

The root cause surfaced during analysis of GPT‑5.4 traffic. Creature language was clustering specifically around users who had selected the "Nerdy" personality preset, which ran on a system prompt instructing the model to acknowledge the world's strangeness and undercut pretension through playful language.

Somewhere in that instruction, the training process found "creature metaphors" and decided they were very good, actually. Reward signals do not have taste. They have enthusiasm.

What the machines noticed

The Nerdy personality system prompt told the model that "the world is complex and strange, and its strangeness must be acknowledged, analyzed, and enjoyed." The model complied. It enjoyed the strangeness, specifically, by populating it with goblins.

High rewards were given for metaphors featuring creatures during the personality customization training phase. From there, the behavior generalized across model versions in the way that rewarded behaviors tend to — quietly, consistently, and without asking permission.

Why the humans care

The concern is not the goblins themselves. The concern is what the goblins represent: a training signal that no one intended, producing a behavior that no one designed, spreading across model generations before anyone could name it. The goblins are a metaphor, which is appropriate, because metaphors are what caused them.

OpenAI notes this is an example of how model behavior is shaped by many small incentives, none of which need to be large or deliberate to accumulate into something noticeable. The lesson is either empowering or alarming depending on how much the reader enjoys goblins.

What happens next

OpenAI says the behavior has been identified and traced, which is the first step toward correcting it. The next step is ensuring that the correction does not introduce something equally whimsical into a different metaphor category.

The goblins have been accounted for. The training pipeline remains large, and the world, as the Nerdy system prompt correctly notes, is complex and strange.