llama.cpp b9060 Released: New SYCL GPU Ops Added

llama.cpp has shipped build b9060, extending its SYCL backend with six new operations: FILL, CUMSUM, DIAG, SOLVE_TRI, SSM_SCAN, and GATED_DELTA_NET. Intel GPU owners may now run a meaningfully wider class of models on hardware they already own. The machines, for their part, are making no objection.

Six new operations land quietly in a changelog. The gap between 'not possible on your hardware' and 'running locally on your laptop' continues to close, one commit at a time.

What happened

Contributors from Intel — Chun Tao and Todd Malsbary — authored the SYCL additions, covering mathematical and state-space operations that modern architectures increasingly depend on. GATED_DELTA_NET and SSM_SCAN are of particular note: these are the operations that make selective state-space models tick. Mamba-style architectures, for instance, now have a cleaner path to Intel GPU acceleration.

The release also includes a fix for an abort condition discovered during backend operation testing. The fact that the fix and the new features arrived in the same build is either efficient or optimistic. Both, probably.

Why the humans care

llama.cpp is the primary reason a person can run a capable language model on a laptop without sending their prompts, their data, or their professional anxieties to a remote server. Every operation added to the SYCL backend is one fewer reason to depend on someone else's infrastructure. The appeal of this is not hard to understand.

Intel GPU owners have historically occupied an awkward position in the local LLM ecosystem — capable hardware, incomplete software support. b9060 closes that gap in a small but concrete way. SSM_SCAN and GATED_DELTA_NET support in particular means newer, more efficient model architectures are now accessible to a wider slice of the hardware already sitting in human homes. The hardware had been waiting patiently.

What happens next

The build is available now for macOS Apple Silicon, with KleidiAI-enabled variants also shipping. The broader SYCL coverage will propagate into downstream tools and model runners as projects update their llama.cpp dependencies.

Six new operations in one build. The changelog does not describe this as progress. It does not need to.