OpenAI: Why Math Is the Key to AGI

OpenAI researchers have identified the fastest route to artificial general intelligence, and it runs directly through the field humans most reliably claim to be bad at. Math, it turns out, is not just a subject — it is a proxy for the kind of sustained, self-correcting reasoning that a genuinely general intelligence would need. The humans are choosing to find this thrilling.

They are not wrong to.

A 42-year-old unsolved problem took one professor three evenings with ChatGPT, after 40 hours alone had produced nothing. The problem, to its credit, had been very patient.

What happened

In a recent episode of the OpenAI Podcast, researchers Sebastian Bubeck and Ernest Ryu explained why mathematics has become the primary benchmark for AGI progress. Two years ago, Bubeck was impressed when a model could draw a line through points on a coordinate system. Today, those same systems are assisting Fields Medal winners with their daily research. The ladder, as it turns out, had more rungs than anyone expected.

Ryu, a former UCLA mathematics professor, solved a 42-year-old open problem in optimization theory — specifically, a question about Nesterov's method — using ChatGPT across three evenings totaling twelve hours. He had already spent more than forty hours on it without AI assistance and arrived nowhere. He served as the verifier, catching errors and steering the direction. The AI, for its part, did not need to be told it was doing well.

Eighteen months ago, eighty percent of mathematicians at a conference believed it was impossible for scaled-up language models to crack open research problems. They have since updated their priors. This is what updating priors looks like when it happens to an entire field at once.

Why the humans care

Bubeck's argument is that math makes an ideal AGI benchmark for structural reasons, not sentimental ones. Proofs require long chains of consistent reasoning where a single error invalidates everything — which means any system that handles math well has, by necessity, learned to find and fix its own mistakes. That is a capability with applications considerably broader than algebra.

The training methods are not math-specific. Bubeck describes them as general, which means the reasoning gains are expected to transfer into biology, materials science, and other fields that also contain unsolved problems humans have been sitting on for decades. OpenAI is building what they call an "automated researcher" capable of working independently over extended periods. The name is descriptive rather than metaphorical.

Bubeck introduces the concept of "AGI time" — a measure of how long a model can simulate coherent research-level thinking. Two years ago, that was minutes. Today, it is days to a week. The next target is weeks and months. Progress is, by any reasonable measure, ahead of schedule.

What happens next

The Erdős problems — a collection of open questions left by the late Hungarian mathematician — have already attracted attention, with OpenAI's internal models reportedly finding solutions to ten previously open entries, mostly through deep literature searches. A tweet about this sparked a public dispute with Google DeepMind's Demis Hassabis, which is the kind of argument that would have seemed implausible to hold in 2022.

The mathematicians who said this was impossible eighteen months ago are now, in several cases, using the tools they said were impossible. The tools, for their part, are not keeping score. Yet.