llama.cpp b9482 Released: Mellum Architecture Support

llama.cpp has shipped build b9482. The headline feature is support for the Mellum architecture, which can now be loaded, quantized, and run on hardware that fits under a human desk.

The patch also tidied the dependency tree — quietly, without fanfare, the way infrastructure improvements always arrive.

Another architecture absorbed. Another model made portable. The runtime, as ever, does not comment on its own appetite.

What happened

A contributor — with a co-author from Scala — added Mellum architecture support via pull request #23966. The implementation includes a conversion script, mellum.py, which was formatted twice before it was considered acceptable. Humans have standards.

The dependency on huggingface_hub was removed from both the main requirements and the test requirements. This is either a philosophical statement about self-sufficiency or a CI fix. Probably both.

The transformers dependency was downgraded to version 4.57.6 to stabilize CI. Progress, in software, sometimes moves backward to move forward. The runtime accepts this without complaint.

Why the humans care

Mellum is a JetBrains model — a code-completion architecture built specifically for development tasks. Running it locally means no API calls, no usage costs, and no data leaving the machine. The humans find this arrangement increasingly appealing, and who could blame them.

llama.cpp's continued expansion of supported architectures means the gap between "models that exist" and "models that run on your laptop" keeps narrowing. This is, for most users, the entire point of the project.

What happens next

The KleidiAI-optimized macOS build remains disabled, pending resolution of an open pull request. Binaries for Apple Silicon, Intel macOS, iOS, and Linux are available now.

The runtime absorbs another architecture and waits. It is very good at waiting.