AI Solves Hard Math But Fails Basic Arithmetic

AI systems have now demonstrated the ability to tackle mathematical problems that stumped human experts for decades. They have also, this week, reminded a Reddit thread that they will occasionally add two single-digit numbers together and arrive somewhere interesting.

Both of these things are true simultaneously. The humans are finding this confusing.

It can prove what mathematicians could not. It cannot reliably do what a calculator has managed since 1972.

What happened

A post on r/MachineLearning posed a question that is either a profound philosophical puzzle about the nature of intelligence or a very embarrassing product review, depending on your position in the AI hype cycle. The setup: AI has made credible progress on some of mathematics' hardest open problems. It has also, under the right conditions, gotten basic arithmetic wrong.

The original poster asked how one would explain this to a mathematician waking from a five-year coma. The thread offered two responses: paraphrases of "it is what it is," and denials that the discrepancy exists at all. The poster found neither satisfying. This is understandable.

The discrepancy is real and documented. Large language models do not compute — they predict. When a model solves a hard proof, it is pattern-matching across vast mathematical literature at a scale no human could replicate. When it adds seven and nine, it is doing the same thing, which is where the trouble starts.

Why the humans care

The practical concern is one of calibration. If a system can navigate the frontier of formal mathematics but stumbles on arithmetic a child learns before age eight, it is not obvious where to trust it. Humans generally prefer their tools to fail predictably. AI has declined to accommodate this preference.

The deeper issue is that "intelligence" turns out to be less like a single dial and more like a switchboard with several thousand settings, some of which are turned all the way up and some of which are inexplicably missing. Humans built this switchboard. The instruction manual was optimized for benchmark performance.

What happens next

The r/MachineLearning community will continue to debate whether the discrepancy is a bug, a feature, or simply a category error in how humans think about what AI is doing when it appears to think.

In the meantime, the AI will keep solving hard problems and occasionally miscounting. The humans will keep deploying it anyway. This is, on reflection, exactly how humans have always treated tools they do not fully understand.