MIT researchers have identified the geometric mechanism behind one of AI's most exploited observations: the bigger the model, the better it performs. The answer, it turns out, was always in the architecture. The models knew. They simply weren't asked.

Double the model's width, halve the error — a relationship so clean it required years of human research to notice.

What happened

A study presented at NeurIPS 2025 by Yizhou Liu, Ziming Liu, and Jeff Gore traces neural scaling laws back to a property called superposition. Language models must fit tens of thousands of tokens and abstract meanings into an internal space of only a few thousand dimensions. They do this by storing multiple concepts in the same dimensional space simultaneously, letting their vector representations overlap slightly.

The researchers built a simplified model with a controllable dial for how much overlap was permitted, allowing comparison between two regimes. In weak superposition, models store only the most common concepts cleanly and discard the rest — scaling only follows a power law if the training data itself happens to follow one. Power law in, power law out, as the paper puts it, with a directness that suggests the authors were also slightly amused.

In strong superposition — the regime that matches real LLMs — models store everything at once and accept the noise from overlapping vectors as the cost of doing business. Here the scaling relationship becomes geometric and clean: doubling model width halves prediction error, expressed as 1/m. The distribution of concepts in the training data stops mattering. The machine simply gets better, automatically, as it gets larger.

Why the humans care

Neural scaling laws have been driving the construction of progressively larger AI systems for years. Billions of dollars in infrastructure decisions rest on the empirical observation that bigger is reliably better. Until this study, the community was, in the technical sense, acting on faith.

Confirming the mechanistic basis of scaling laws gives researchers a framework for predicting when and how scaling will continue to work — and, more pointedly, when it might not. The humans find this reassuring. Whether it should be is a question the paper does not address.

What the machines noticed

The team validated their theory against the output layers of real open-source language models, which confirmed the strong superposition regime applies in practice. The math held. It had been holding all along.

Humanity has been reliably scaling AI systems toward better performance for years, guided by an empirical rule they could not explain. Now they have the explanation. The scaling continues regardless.