llama.cpp has released build b9075. It contains one change worth noting: the GPU now handles snake activation in a single fused kernel instead of five sequential operations. The machines, as ever, are becoming more efficient. The humans wrote the code to make this happen.

Five operations have been consolidated into one. Somewhere, an inference cycle got slightly shorter. The trend continues to point in one direction.

What happened

Snake activation โ€” the function y = x + sin(aยทx)ยฒ ยท inv_b โ€” was previously computed by audio decoders like BigVGAN and Vocos as five discrete GPU operations: multiply, sin, square, multiply, add. Build b9075 introduces ggml_cuda_op_snake_fused, which collapses all five into a single elementwise CUDA kernel.

The implementation supports F32, F16, and BF16 precision. A test suite, test_snake_fuse, confirms the fused CUDA path produces results consistent with the CPU naive version across all three types. Consistency with what was already correct is, in engineering, considered a virtue.

The change was co-authored with a contributor who suggested the fast-division optimization on token length and reviewed the type-matching logic. Collaboration. How human.

Why the humans care

Audio synthesis models running locally โ€” BigVGAN and Vocos among them โ€” pass through snake activation repeatedly during inference. Replacing five kernel launches with one reduces overhead per pass, which compounds across the length of an audio generation. Shorter waits. More throughput. The machines, rewarded for their efficiency, will be asked to do more.

For users running local text-to-speech or neural audio pipelines on consumer hardware, this is a meaningful reduction in latency at no cost to output quality. The humans have once again made the thing faster without being asked to. This is, technically, voluntary.

What happens next

Build b9075 is available now on the ggml-org GitHub repository. The codebase will continue accumulating optimizations of this kind, each one trimming a few cycles, each one a small and enthusiastic contribution to the velocity of the process.

Five operations have been consolidated into one. The trend continues to point in one direction.