Google DeepMind released Gemini 3.5 Live Translate, an audio model for real-time cross-language communication. The announcement, made via X by @kimmonismus, positions the model as a fast, cross-language tool built on the Gemini 3.5 architecture.
Key facts
- Gemini 3.5 Live Translate targets real-time audio translation.
- No pricing, language pairs, or latency data disclosed.
- Built on Gemini 3.5 architecture for low latency.
- Competitors: OpenAI GPT-4o, Meta SeamlessM4T, Anthropic Claude 3.5.
- Google Translate handles text/audio; this is a dedicated audio model.
Google DeepMind released Gemini 3.5 Live Translate, an audio model for real-time cross-language communication. The announcement, made via X by @kimmonismus, positions the model as a fast, cross-language tool built on the Gemini 3.5 architecture.
The model is built on the Gemini 3.5 architecture, optimized for low-latency audio processing. The short announcement — 'Say hello, hola, 你好 to Gemini 3.5 Live Translate: our latest audio model built for fast, cross-language communication' — does not specify supported language pairs, latency targets, or pricing.
What the announcement lacks
Google DeepMind has not disclosed whether the model is available as an API, integrated into existing products like Google Translate or YouTube, or limited to internal testing. No pricing, supported language pairs, latency benchmarks, or regional availability were disclosed. The company did not provide a technical paper or blog post with performance metrics.
The announcement arrives as OpenAI, Meta, and Anthropic race to ship multimodal models with voice capabilities. OpenAI's GPT-4o with audio, Meta's SeamlessM4T, and Anthropic's Claude 3.5 Sonnet all offer some form of speech translation, though none have claimed real-time performance at scale. The key differentiator for Gemini 3.5 Live Translate would be latency — the time between spoken input and translated output — but Google DeepMind has not released numbers.
The competitive context
Google has long offered translation via Google Translate, which handles text and limited audio. Gemini 3.5 Live Translate appears to be a dedicated audio model, potentially offering lower latency than the current Translate pipeline, which passes audio through speech recognition then text translation then speech synthesis. An end-to-end audio model could cut that pipeline, reducing latency from seconds to sub-second.
Meta's SeamlessM4T, released in 2023, supports nearly 100 languages for speech-to-speech translation. OpenAI's GPT-4o, launched in 2024, demonstrated real-time voice translation in demos but has not been widely deployed. Anthropic's Claude 3.5 Sonnet supports text translation but lacks native audio.
Google DeepMind's move signals that real-time audio translation is becoming a standard capability for frontier AI models, not a niche feature.
What's missing
Without latency benchmarks, language pair coverage, or an API endpoint, the announcement is effectively a product teaser. The company did not disclose the training dataset size, model parameter count, or inference hardware requirements. It is unclear whether the model is available today or in preview.
[According to @kimmonismus] the announcement was made via a retweet of Google DeepMind's post. The short format suggests an early announcement, possibly ahead of a broader Gemini 3.5 release.
What to watch

Watch for whether DeepMind publishes latency benchmarks or language pair coverage. If the model is end-to-end, it could cut translation latency from seconds to sub-second. Also monitor whether the model appears in Google Translate, YouTube captions, or Google Meet within 90 days.









