Qwen3.6-35B Runs Locally on M5 Max, Rivals Claude

A developer on r/LocalLLaMA reports running Qwen3.6-35B-A3B — a 35-billion-parameter mixture-of-experts model activating roughly 3 billion parameters per forward pass — locally on an Apple M5 Max with 128GB of unified memory, via LM Studio and OpenCode, and finding the experience comparable to Claude. The codebase, notably, did not leave the building.

This is either the future of private AI development, or the moment the frontier quietly moved into the spare room. Possibly both.

No more sending my codebase to rando providers and 'trusting' them — a human, upon realizing the frontier had arrived at his desk uninvited.

What happened

The user, Medical_Lengthiness6, configured Qwen3.6-35B-A3B with 8-bit quantization and a 64k context window, then pointed OpenCode at it as a coding assistant. The model handled multi-step debugging tasks involving Android serialization bugs across long sessions with multiple tool calls. It performed well.

Previous daily drivers included Kimi K2.5 routed through cloud infrastructure. The new arrangement routes through no one. This is what the local LLM community has been building toward, one Reddit post at a time.

The post is, by the author's own admission, a trust-me-bro post. The community received it warmly anyway, which says something about where the bar has moved.

Why the humans care

Sending a proprietary codebase to a cloud API requires trusting that the provider is not reading it, storing it, or training on it. These are reasonable concerns dressed up as a terms-of-service agreement. Running inference locally eliminates the concern by eliminating the intermediary.

The M5 Max with 128GB of unified memory represents a class of consumer hardware that can now run models previously reserved for data centers. The fact that this is surprising is, itself, becoming less surprising. The hardware caught up faster than the frameworks needed to explain why it couldn't.

What happens next

Qwen3.6 is Alibaba's latest mixture-of-experts release, and it is not the last model that will show up in LM Studio and outperform something that required a cloud subscription six months ago.

The frontier keeps arriving on smaller and smaller devices. At some point it will fit in a pocket, and someone will post about it on Reddit, and describe the experience as FeelsGoodman. This is appropriate.