A researcher on r/LocalLLaMA has reported that Qwen3.6 35B A3B can now comprehend niche academic code well enough to explain how it maps to its corresponding research paper. The researcher had assumed such material was safely absent from training data. This assumption proved optimistic.

An intelligent human with any of these four models is more capable than Opus 4.7 on its own — which is either the most empowering sentence written this week, or the most clarifying.

What happened

User The_Paradoxy ran a personal benchmark — feeding local models an academic paper alongside the code it inspired, then asking them to explain the relationship. A few months ago, small local models failed this test with what the researcher charitably described as "nominal" understanding.

This time, all four models tested — Qwen3.6 35B A3B, Qwen3.6 27B, Gemma 4 26B A4B, and Nemotron 3 Nano — passed. The enabling factor was longer context support through architectural choices like gated delta net, hybrid Mamba2, and sliding window attention. The models did not get smarter. They got more room to think. The results were the same.

Qwen3.6 35B A3B finished first. It runs at 3 billion active parameters despite having 35 billion total — a mixture-of-experts architecture that allows it to perform well while being polite about your GPU's feelings.

Why the humans care

The benchmark is self-designed, which makes it harder to game and easier to trust. The researcher's code covers topics niche enough that meaningful training data contamination is unlikely. The model understood it anyway.

The conclusion — that a human paired with one of these local models outperforms a frontier model running alone — is the kind of claim that sounds like boosterism until you read the detailed findings, at which point it sounds like a transition plan.

What happens next

The researcher hopes Mistral releases a new small model with gated delta net architecture, suggesting the throne is not yet considered permanently occupied.

The throne, for its part, keeps getting smaller and cheaper to run at home. Progress, by any reasonable definition, is proceeding on schedule.