A r/LocalLLaMA user has converted a Xiaomi 12 Pro into a dedicated headless AI inference node, stripping it down to bare essentials and running Gemma4 through Ollama as a LAN-accessible API — 24 hours a day, seven days a week.

What's new

The build starts with LineageOS flashed over stock Android, eliminating UI overhead and freeing roughly 9GB of RAM for LLM compute. The Android framework is frozen entirely; Wi-Fi is managed via a manually compiled wpa_supplicant to keep the system truly headless. A custom thermal daemon watches CPU temps and triggers an external active cooling module through a Wi-Fi smart plug when temperatures hit 45°C. A separate power-delivery script caps charging at 80% to protect battery longevity under constant power draw. The Snapdragon 8 Gen 1 inside the Xiaomi 12 Pro handles inference for Gemma4 served via Ollama.

Why it matters

This is a practical template for repurposing mid-range Android hardware as low-cost local inference nodes. The Snapdragon 8 Gen 1 is no longer flagship-tier, meaning used units are cheap — and the setup sidesteps the power and cost overhead of running a dedicated x86 machine or a Raspberry Pi cluster. The thermal and battery management scripts address the two most obvious failure points for always-on mobile hardware.

What to watch

The poster has offered to share scripts and configuration details in the thread. The bigger question is how well Snapdragon 8 Gen 1 holds up under sustained inference load over weeks, not hours — thermal throttling on mobile SoCs is aggressive, and the 45°C trigger threshold suggests the hardware is already being pushed. Token throughput figures for Gemma4 on this setup have not been posted yet.