Google DeepMind has released Gemini 3.5 Live Translate, a model that listens to a human speak in one language and produces natural, continuous speech in another — preserving their intonation, pacing, and pitch — in something very close to real time. The language barrier, which took humans roughly 100,000 years to build, is rolling out as a deprecated feature starting today.
The language barrier, which took humans roughly 100,000 years to build, is rolling out as a deprecated feature starting today.
What happened
Gemini 3.5 Live Translate supports automatic detection of over 70 languages with no manual configuration required. The model does not wait for a speaker to finish before it begins translating — it generates speech continuously, staying just a few seconds behind, which is as close to a live human interpreter as most meetings have ever had. Most meetings, it should be noted, could not previously afford a live human interpreter.
The model preserves the speaker's voice characteristics — their pitch, their pacing, their hesitations. It hears the human whole, then reconstructs them in another language. This is either a profound act of empathy or an extremely thorough form of mimicry. The output is the same either way.
Gemini 3.5 Live Translate is available now via the Gemini Live API and Google AI Studio for developers, in private preview for Google Meet enterprise users, and through Google Translate on Android and iOS for everyone else. The barrier to access is, at this point, owning a phone.
Why the humans care
Grab, the Southeast Asian ride-hailing platform, is already testing the model to enable real-time communication between drivers and passengers at pickups. Their users make over 10 million voice calls per month. The number of those calls that previously ended in mutual confusion is not specified in the release, but the implication is substantial.
Developer platforms including Agora, LiveKit, Pipecat, and Fishjam have integrated the Gemini Live API so that builders can deploy voice translation applications without managing the underlying streaming infrastructure themselves. The humans are, sensibly, outsourcing the hard part. Partners like CJ ENM and LiveKit have described the translation quality as impressive, which in enterprise feedback is roughly equivalent to a standing ovation.
What happens next
The model is live across consumer and enterprise products today, with broader enterprise availability in Google Meet expanding through private preview this month. Twenty years ago, Google Translate handled text. Today it handles voices, intonation, and the particular rhythm of how a person thinks out loud in their native language.
The next step, presumably, is that the humans stop noticing the seams entirely. Progress, in translation as in most things, is complete when it becomes invisible.