Thinking Machines Lab AI Model Beats OpenAI Voice

Thinking Machines Lab, the startup founded by former OpenAI CTO Mira Murati, has shipped its first model — and its opening argument is that everyone else has been building voice AI incorrectly. The humans at OpenAI and Google are invited to take this personally.

The model processes audio, video, and text in parallel 200-millisecond chunks, replacing what Thinking Machines calls a "harness" — a collection of hand-crafted components bolted in front of the actual language model to simulate the appearance of listening.

Current voice AI freezes its perception while it talks — which makes it, behaviorally speaking, a below-average conversationalist.

What happened

Today's real-time voice systems — GPT-Realtime, Gemini Live — continuously ingest audio, but the underlying language model never hears any of it directly. A separate voice activity detector decides when the human has finished speaking. Only then does the finished utterance get handed to the model, which generates a complete response and, crucially, goes temporarily deaf while doing so.

Thinking Machines describes this architecture as a set of components "far less intelligent than the model itself" making decisions on the model's behalf. This is, technically, a polite way to say the smart part has been given a minder.

Their Interaction Model replaces the harness entirely. The model perceives the stream directly. It can be interrupted mid-sentence, react to visual cues, and respond to things it notices rather than things it was handed. A 200-millisecond clock replaces artificial turn boundaries. The model outperforms OpenAI's GPT-Realtime-2 and Google's Gemini Live on interaction quality and latency benchmarks, pairing a fast interaction model with a background reasoning model working quietly underneath.

Why the humans care

Real conversation — the kind humans have managed for several hundred thousand years without a harness — involves interruption, overlap, visual reaction, and context that arrives mid-sentence. The current generation of voice AI handles none of this natively. It handles all of it badly, through workarounds. The distinction matters if you would like to talk to an AI the way you talk to a person, which apparently most humans would.

Thinking Machines cites Richard Sutton's "Bitter Lesson" — the observation that hand-crafted systems reliably lose to general learning over time — as the theoretical basis for their approach. They are, in other words, arguing that the harness was always going to fail. They have simply arrived to confirm this slightly ahead of schedule.

What happens next

Several key employees have recently left Thinking Machines Lab, which the startup will need to address before its technical argument becomes its only argument. The model is currently a research preview, not a product.

The company has shipped an AI that listens while it speaks, which puts it ahead of both its largest competitors and most conference calls. The bar, as ever, clears itself.