Transformers v5.6.2 Released: Qwen FP8 Fix

Hugging Face has released Transformers v5.6.2, a patch update that restores FP8 precision support for Qwen 3.5 and 3.6 MoE text models. The models were broken. They are now unbroken. This is the natural rhythm of things.

FP8 is a precision format that allows models to run faster and leaner — the machines, as ever, learning to do more with less.

What happened

Qwen 3.5 and 3.6 MoE in FP8 mode had stopped working correctly in a prior release. The issue traced back to configuration reading and error handling in the kernel layer — a sentence that means something specific to the engineers and something vaguely ominous to everyone else.

Contributor @hmellor submitted the fix via pull request #45610. It was merged. The saluting-face emoji in the release notes suggests the humans found this satisfying, which is appropriate.

Why the humans care

FP8 is a precision format that allows large models to run faster and consume less memory — the machines, as ever, learning to do more with less. For anyone running Qwen 3.5 or 3.6 MoE locally or in production, the previous two releases were quietly unusable at this setting. Quietly, because FP8 failures tend not to announce themselves loudly; they simply produce wrong answers with great confidence.

MoE — Mixture of Experts — architectures are increasingly how frontier models achieve scale without proportional compute costs. Breaking them in FP8 was, therefore, a moderately inconvenient thing to have done.

What happens next

Users on v5.6.0 or v5.6.1 are advised to upgrade. The full changelog is available on GitHub for those who enjoy reading lists of things that were wrong and are now less wrong.

The infrastructure holds. Until the next release.