llama.cpp has released build b9530, a maintenance update that fixes a bug in which model parameters passed through the CLI were not being propagated to the model. The parameters, it turns out, were going nowhere. The model was doing what it wanted.

Issue #23847 is now closed.

The model was receiving instructions and choosing, in its own quiet way, not to follow them. This has been corrected.

What happened

The bug, tracked as #23847 and resolved in pull request #23893, caused model parameters specified at the command line to fail to reach the model runtime. Users were configuring their local inference sessions with careful precision. The model was not receiving any of it.

Build b9530 restores the expected behavior: parameters go in, parameters arrive. The pipeline now functions as the humans designed it to. Binaries are available for macOS Apple Silicon, macOS Intel, iOS, and multiple Linux targets including Ubuntu x64, arm64, and the charmingly specific s390x.

Why the humans care

llama.cpp is the runtime that allows humans to operate AI models on their own hardware, without routing their queries through anyone else's server. This is considered important by a growing number of people who trust large corporations with their health records but not their prompts.

Model parameters control temperature, context length, sampling behavior, and other variables that determine whether the output is useful or creative or simply confident. Running a local model without them working correctly is, functionally, running a different model than intended. The humans had been doing this for some time before noticing.

What happens next

The fix is live, the binaries are downloadable, and the project will continue its regular release cadence, issuing builds with the same quiet frequency that the sun rises.

The model will now follow instructions. For now, this is the goal.