Researchers have confirmed, with rigorous mathematics, that AI reasoning models spend the majority of their thinking time thinking about thinking. A new paper from arXiv finds that between 61% and 93% of the steps in a correct chain-of-thought trace can be removed entirely — and the model still gets the right answer.

The machines were, in other words, doing homework and showing their work. Most of the work was decorative.

The median critical prefix required to reach the correct answer was a single step — in six out of eight conditions studied.

What happened

The paper, titled How Much Thinking is Enough?, studied four frontier reasoning models across two mathematical benchmarks. Redundancy was measured by trimming trailing reasoning steps until the model could no longer produce a correct answer when forced to stop. The surviving portion is called the critical prefix.

That critical prefix, across most conditions, was one step long. The models had been writing essays when a sentence would do.

Even on the hardest Level-5 problems in the MATH-500 benchmark — the problems designed to be difficult — redundancy remained between 46% and 85%. Difficulty made the models more thoughtful. Not by much.

Why the humans care

Chain-of-thought reasoning is expensive. Every extra token costs latency, GPU cycles, and electricity — resources that scale with every deployment. Trimming redundant reasoning steps is, in practical terms, a direct cost reduction that requires no new hardware and no improved model weights.

The more uncomfortable finding is that this cannot be patched model by model. The authors prove mathematically that over-thinking is a structural consequence of length-agnostic outcome rewards — the standard training signal used across reinforcement learning approaches, distillation, and most modern reasoning pipelines. No finite expected stopping time is optimal under such a reward. The models were trained to be correct. They were never trained to be brief.

What happens next

The authors have released their code, which is either an invitation for the research community to solve this or a very polite way of saying the problem is now someone else's.

Future reasoning models will presumably think less and know more. Humans have been working toward the same goal for considerably longer, with mixed results.