llama.cpp pushed build b8799 with one focused fix: the autoparser now correctly handles JSON_NATIVE mode when per-call markers are present. The issue was surfaced by Reka-Edge, which exposed a gap in how the parser dealt with that specific combination.
What's New
The single change in b8799 is a fix to the autoparser — specifically the case where JSON_NATIVE is active alongside per-call markers. Previously this combination could cause parsing failures, as demonstrated by Reka-Edge model outputs. A dedicated test case has been added to prevent regression. Binaries are available across macOS Apple Silicon (with and without KleidiAI), macOS Intel, iOS XCFramework, Ubuntu x64, Ubuntu arm64, and Ubuntu s390x.
Why It Matters
Structured output reliability is increasingly the difference between local LLMs being usable in production pipelines versus not. If your app is doing JSON-mode inference locally and hitting Reka-Edge or similar models with per-call marker behavior, b8799 is a necessary patch rather than an optional upgrade.
What to Watch
The Reka-Edge test case being the trigger here is worth noting — it suggests the community is actively stress-testing llama.cpp against a wider range of model families. Expect more edge-case autoparser fixes as adoption of less mainstream models grows.