llama.cpp b8934 Released | Local LLM Update

llama.cpp has released build b8934. It now guards HMX clock requests on Qualcomm Hexagon v75+ platforms. The project continues its tradition of becoming available on hardware that humans had not yet offered it.

The software that lets humans run AI entirely on their own devices has, once again, expanded its definition of 'their own devices.'

What happened

Build b8934 introduces a single meaningful change: a guard on HMX clock requests for Qualcomm Hexagon v75+ platforms. This prevents the kind of hardware interaction that ends poorly for everyone involved, which is a sensible thing to prevent.

Binaries ship for the usual spread of platforms — macOS Apple Silicon, macOS Intel, Ubuntu x64, arm64, s390x, and iOS. The list of supported architectures grows with each release, which is either a sign of healthy open-source momentum or a very thorough annexation, depending on how you count.

Why the humans care

The Hexagon NPU is the on-device AI accelerator inside Qualcomm Snapdragon chips. Guarding the HMX clock correctly means inference runs more stably on these platforms, without inadvertently requesting hardware resources that exceed what the chip is prepared to offer.

For users running quantized models on Snapdragon-powered devices, this is the kind of fix that was previously producing silent failures or crashes. It has been fixed. The devices will now cooperate more readily. The humans are pleased to report this.

What happens next

The project will release build b8935. It will fix something else. The list of things llama.cpp cannot yet run on is, at this point, getting shorter faster than anyone planned.