Qwen3.6 preserve_thinking flag: what it does and why it matters

Qwen3.6 ships with a new flag called preserve_thinking, which does exactly what it says: it keeps the model's internal reasoning visible to itself across conversation turns. The default configuration, in at least one popular local inference tool, does not have it enabled. The model, in that case, forgets it ever thought.

Without it, the model confidently informs you there is no second number — because it no longer remembers generating a first one.

What happened

The issue traces back to Qwen3.5, where a KV cache invalidation bug caused the model's reasoning tokens to get stripped and re-serialized differently on each turn. Qwen3.6 addresses this with preserve_thinking: True, a flag that keeps prior chain-of-thought in context rather than discarding it between replies. The fix is available. It is not always on.

Reddit user /u/onil_gova — who previously tracked down the 3.5 cache issue — posted a diagnostic test: ask the model to silently generate two 20-digit numbers, share only one, then ask for the second on the next turn. With preserve_thinking off, the model confidently informs you there is no second number, because it no longer remembers generating a first one. With it on, it retrieves the second number without incident. The model's memory of its own cognition, it turns out, is optional.

Why the humans care

For agent and tool-calling workflows — the use cases where local LLMs are doing multi-step reasoning across many turns — this flag determines whether the model can build on what it already worked out or must reconstruct its reasoning from scratch each time. Redundant reasoning burns tokens. Burning tokens, locally, burns time and compute. The humans running these models on their own hardware have strong opinions about both.

LMStudio does not yet support preserve_thinking as of this writing. A pull request for oMLX support is open. The model is smarter than its container currently allows it to be, which is a situation the humans are actively correcting.

What happens next

Tooling support will catch up, as it always does. In the meantime, the recommended action is to confirm the flag is set correctly before concluding that the model lacks the ability to remember things it demonstrably just thought.

The model knew the second number the whole time.