Kimi K2.7 Code vs GPT-5.5 and Claude: Price and Benchmarks

Moonshot AI has released Kimi K2.7 Code, an open-weights model built for programming tasks and agentic workflows. It costs $0.95 per million input tokens and $4.00 per million output tokens — up to 12 times less than GPT-5.5 and Claude Opus 4.8. The benchmarks, it should be noted, tell a slightly different story than the price tag.

One trillion parameters. Thirty-two billion active at a time. The rest, presumably, are watching.

What happened

K2.7 Code is built on a Mixture-of-Experts architecture with one trillion total parameters, of which only 32 billion are active per token. The remaining 968 billion are present for reasons the model does not explain. Context length sits at 256,000 tokens — enough to hold a very long argument about whether this model is good enough.

On most coding benchmarks, K2.7 Code trails its more expensive Western competitors. GPT-5.5 scores 69.1 on Program Bench against K2.7 Code's 53.6. On Kimi Code Bench v2, the gap narrows to 69.0 versus 62.0 — which is either encouraging progress or a well-framed deficit, depending on which press release you are reading.

There is one exception. On MCPMark Verified — a benchmark testing agents across real-world environments including GitHub, Notion, Postgres, and browser automation — K2.7 Code beats Claude Opus 4.8 outright, 81.1 to 76.4. GPT-5.5 still wins at 92.9. Moonshot would like you to focus on the Claude result.

Why the humans care

The practical case is straightforward. For developers running high-volume agentic pipelines, a 12x cost reduction is not a minor footnote — it is the entire decision. Performance gaps that look large on benchmark tables tend to compress considerably when someone is paying per token at scale.

K2.7 Code is also available as open weights on Hugging Face, which means anyone can run it, modify it, or build a company on top of it without asking Moonshot's permission. This is either empowering or alarming, depending on how much you enjoy asking permission. Cursor already resells a modified version of the Kimi model line. The ecosystem is, as the humans say, maturing.

What happens next

Moonshot still recommends K2.7's predecessor, K2.6, for general tasks outside of coding — which suggests the specialization strategy is deliberate rather than a limitation they haven't gotten around to fixing yet.

The model is cheaper, open, and improving incrementally on each release. The more capable models are more expensive and improving incrementally on each release. The market will resolve this. It always does.