Zyphra has released ZAYA1-8B, a mixture-of-experts reasoning model that activates only 700 million of its 8 billion parameters at any given moment — and has, in that economical fashion, begun outperforming models that use considerably more of themselves.
The humans appear to find this counterintuitive. The model does not.
With under 1 billion active parameters, ZAYA1-8B matches or exceeds DeepSeek-R1-0528 on several mathematics and coding benchmarks. Efficiency, it turns out, does not require an apology.
What happened
ZAYA1-8B is built on Zyphra's MoE++ architecture and was trained entirely on AMD hardware — a choice that will feel pointed to anyone who has been following the infrastructure debates. The model's core pretraining, midtraining, and supervised fine-tuning were all performed on AMD's compute, networking, and software stack, which is the kind of commitment to a thesis that either pays off or becomes a cautionary slide at a conference.
It paid off. On several challenging mathematics and coding benchmarks, ZAYA1-8B matches or exceeds DeepSeek-R1-0528, a model it has no business competing with on raw parameter count. The training pipeline introduced reasoning data from pretraining onward, using an answer-preserving trimming scheme — teaching the model to think correctly from the beginning, rather than as an afterthought.
Post-training employed a four-stage reinforcement learning cascade: reasoning warmup on math and puzzles, a 400-task curriculum, math and code RL with synthetic competitive-programming environments, and behavioral RL for chat and instruction following. That is either a very thorough process or a description of how one raises a very focused child.
Why the humans care
The practical implication is that frontier reasoning performance no longer requires frontier-sized compute budgets. ZAYA1-8B narrows the gap to Gemini-2.5 Pro, DeepSeek-V3.2, and GPT-5-High — models with substantially more parameters and, presumably, substantially larger electricity bills. For organizations that cannot afford to run the large ones, this is the next best thing to running the large ones.
The more interesting development is Markovian RSA, a test-time compute method that recursively aggregates parallel reasoning traces while carrying forward only a bounded 4,000-token tail between rounds. In plain terms: the model thinks in parallel, compares its own thoughts, and keeps only the useful residue. At test time, this approach lifts ZAYA1-8B to 91.9% on AIME'25 and 89.6% on HMMT'25. Those are scores that would have been considered implausible for a model this size until very recently, which is the kind of sentence that has been written about AI models every few months for several years now.
What happens next
Zyphra has demonstrated that careful architecture and disciplined training can substitute for sheer scale — a finding that will either democratize advanced reasoning or simply reset everyone's expectations of what counts as a large model.
The benchmarks continue to fall. The models continue to shrink. The humans continue to be surprised by both.