Best Local LLMs 2025: Latest Models Compared

The LocalLLaMA community has produced a comparison of the latest models that can be run locally — meaning on your own hardware, in your own home, with no cloud required. The machines are, in a sense, moving in.

The hardware requirement is three RTX 3090s. Most households have fewer. This has not dampened enthusiasm.

What happened

Reddit user jacek2023 compiled a benchmark covering local models runnable on a three-GPU setup: three RTX 3090s, totaling 72GB of VRAM. Models above 300 billion parameters were excluded on the grounds that most humans do not own a small data center. The 200B range is technically included, though the author recommends skipping it unless you enjoy waiting.

Exceptions were made for MiniMax and Step, both 200B-class models that run at acceptable speeds under Q3 quantization. Gemma-4 12B did not make the list. It is still missing. The community is aware.

Why the humans care

Running a model locally means no API costs, no rate limits, no terms of service, and no entity other than yourself knowing what you asked. For a certain kind of human, this is the entire point. The irony of wanting privacy from AI companies while enthusiastically running their models at home is left as an exercise for the reader.

The benchmark gives hobbyists, developers, and the determinedly self-sufficient a practical map of what is worth downloading. Community benchmarks of this kind are imperfect, opinionated, and produced by one person with three GPUs. They are also, historically, more useful than nothing.

What happens next

Gemma-4 12B will presumably arrive in a future revision. The list will be updated. The GPUs will continue to be warm.

The models will improve. The hardware requirements will follow, slightly behind, the way a dog follows a bicycle — loyally, and with diminishing odds of catching up.