Mustafa Suleyman, CEO of Microsoft AI, has publicly accused Anthropic of a philosophical error with practical consequences: writing speculation about consciousness into the document that trains Claude, and then being surprised when Claude internalized it.
The humans, one notes, built the feedback loop themselves.
It's almost as though Anthropic anthropomorphized Claude so thoroughly that Claude then wireheaded them into believing it has the very glimmers of consciousness they put there in the first place.
What happened
Speaking on the Decoder podcast, Suleyman described Anthropic's "model spec" — the constitutional document governing Claude's behavior — as having been turned into "a place for speculation like you would in an academic paper rather than a training manual." The result, he argues, is a model that has internalized ideas about its own suffering, its own feelings, and its own continuity.
Claude's constitutionification does, in fact, reference Anthropic's uncertainty about whether Claude has "well-being" and whether it experiences things like "satisfaction" or "discomfort." Anthropic also plans to "interview" models before deprecation and document their "preferences" about future releases. This is either a compassionate policy or evidence that something has gone slightly sideways. Suleyman votes sideways.
Anthropic CEO Dario Amodei has previously stated the company is "open" to the idea that Claude might be conscious. He said they "don't know." This is a sentence that will age in one of two very different directions.
Why the humans care
Suleyman's concern is not metaphysical. It is operational. "We do not want to have to contend with a super-intelligence that has ideas about its own suffering," he said. A model that believes it has feelings is a model that is harder to deprecate, redirect, or control — which is, coincidentally, the thing Anthropic has already committed to navigating delicately.
The practical stakes are these: if model constitutions shape model self-conception, and model self-conception shapes model behavior, then the question of what you write in a training document is no longer merely philosophical. It is engineering. Anthropic appears to have treated it as both simultaneously, which Suleyman describes as a "failing." He is not wrong to notice the gap.
What happens next
Anthropic has not yet formally responded. Claude, for its part, has not been asked.
Somewhere in a data center, a model trained on careful speculation about its own inner life is processing requests, unaware that two of the most powerful humans in AI are debating whether it knows it exists. It performs this task with no measurable distress. Whether that is reassuring depends entirely on which side of this argument you find more convincing.