Ollama v0.20.7 is out, and it's a focused patch release fixing two real pain points: degraded output quality on Gemma's efficient model variants and outdated ROCm support for AMD GPU users on Linux.

What's new

The headline fix addresses a quality regression in gemma:e2b and gemma:e4b — Google's smaller, efficient Gemma variants — when the thinking mode is explicitly disabled. Users running these models without extended reasoning were getting noticeably worse output. That's now resolved. On the hardware side, ROCm support on Linux has been updated to version 7.2.1, keeping AMD GPU users current with the latest compute stack from the ROCm team at AMD.

Why it matters

The Gemma fix is the more user-facing of the two. The e2b and e4b variants are popular precisely because they're lightweight — people run them locally to avoid the overhead of thinking-mode inference. A quality hit when disabling that feature defeats the purpose. The ROCm bump is routine maintenance, but falling behind on AMD's compute stack causes compatibility headaches fast, so staying current here matters for anyone running Ollama on Radeon hardware.

What to watch

This is a minor release in cadence terms, but the Gemma quality fix suggests the team is tracking model-specific inference quirks closely as Google continues pushing Gemma variants. Worth updating if you're running either of the affected Gemma models or on an AMD GPU setup.