Google Releases Gemini 3.5 Live Translate, a Streaming Speech-to-Speech Audio Model Covering 70+ Languages Across Meet, Translate, and the Live API

Google simply introduced Gemini 3.5 Live Translate. It is their newest audio mannequin for dwell speech-to-speech translation. Speech-to-speech means spoken audio goes in, and translated spoken audio comes out. The mannequin detects over 70 languages mechanically and generates translated speech. It preserves the speaker’s intonation, pacing, and pitch in the output. Turn-by-turn methods await a speaker to complete earlier than responding. Gemini 3.5 Live Translate generates speech repeatedly as a substitute. It balances a trade-off between ready for context and translating instantly. More context improves high quality. Faster output retains the translation in sync with the speaker. The outcome stays a few seconds behind the speaker all through a session.

Gemini 3.5 Live Translate

Gemini 3.5 Live Translate is a single audio mannequin (gemini-3.5-live-translate-preview), not a chat assistant. It processes speech as the audio streams in, moderately than after a full sentence. It handles multilingual inputs with out manually configuring settings. Its noise robustness lets functions run in loud, unpredictable environments.

The mannequin is rolling out throughout three surfaces. Developers get it in public preview by the Gemini Live API and Google AI Studio. Enterprises get a non-public preview in Google Meet beginning this month. Everyone else will get it by the Google Translate app on Android and iOS.

How the Continuous Streaming Works

The design distinction issues for constructing real-time options. A conversational Live agent makes use of turn-based interactions. It depends on pauses, intent detection, and interruption dealing with. Live Translation makes use of steady stream processing as a substitute. It interprets as the speaker talks, with out ready for turns to finish.

To maintain strict real-time latency thresholds, the translation path accepts audio enter solely. Text enter isn’t supported in translation mode. The mannequin additionally drops device use and system directions on this mode. That retains it a centered translator pipeline moderately than a basic agent.

Building With the Live API

Developers configure translation inside the Live API session setup. You set a translationConfig block inside the generationConfig. The targetLanguageCode subject takes a BCP-47 code, reminiscent of "pl" or "es". BCP-47 is the commonplace format for language tags like en or pt-BR. It defaults to "en". The echoTargetLanguage boolean controls enter that’s already in the goal language. When true, the mannequin echoes that speech. When false, it stays silent. You can even allow enterAudioTranscription and outputAudioTranscription for textual content transcripts.

Audio codecs are mounted. Input is uncooked 16-bit PCM at 16kHz, mono, little-endian. Output is uncooked 16-bit PCM at 24kHz, mono, little-endian. PCM is uncompressed uncooked audio. You ship audio in chunks of 100ms. For client-side apps, ephemeral tokens on the v1alpha endpoint keep away from exposing your API key.

Dimension	Live Agent	Live Translation
Model function	Assistant that listens, causes, and acts	Interpreter / real-time translator pipeline
Interaction	Turn-based, with interruption dealing with	Continuous stream processing, no turns
Tools	Function calling, Google Search, directions	Translation solely, no instruments or directions
Inputs	Text, audio, video, and picture	Audio solely, for strict latency
Configuration	Generation, speech, instruments, directions	`targetLanguageCode` and `echoTargetLanguage`

Use Case

The mannequin targets dwell interpretation throughout a number of settings. Google lists multilingual calls, conferences, classes, and broadcasts. Developer platforms cut back the integration work for real-time media. Agora, Fishjam, LiveEquipment, Pipecat, and Vision Agents already use the Live API. These platforms deal with the advanced real-time media streaming infrastructure. That lets builders deal with the consumer expertise as a substitute.

Google’s instance app demonstrates dubbing and simultaneous multi-language translation. Grab is testing the mannequin for driver-and-traveler communication at pickups. Grab customers make over 10 million voice calls per thirty days. CJ ENM, LiveEquipment, and others reported optimistic suggestions on high quality, accuracy, and low latency.

How It Changes Google Meet and Translate

According to Google’s official launch, Google Meet will quickly use 3.5 Live Translate for speech translation. The desk reveals the acknowledged before-and-after for Meet.

Capability	Previous Meet	With 3.5 Live Translate
Languages	5	70+
Combinations per assembly	Only to and from English	2000+ combos
Access	Existing interface	Updated interface for immediate entry

The Meet replace is in non-public preview for choose enterprise Workspace clients this month. A broader rollout follows later this 12 months. In the Translate app, the Live translate characteristic works with any linked headphones. It mirrors the speaker’s tone throughout 70+ languages. Android additionally features a listening mode. You maintain the telephone to your ear like a common name. The translated audio then streams by the earpiece, with out others listening to.

Key Takeaways

Gemini 3.5 Live Translate is Google’s newest audio mannequin for dwell speech-to-speech translation throughout 70+ languages.
It streams repeatedly as a substitute of turn-by-turn, staying a few seconds behind the speaker.
Developers can configure it by way of the Live API utilizing targetLanguageCode and echoTargetLanguage; audio-only, 16kHz in, 24kHz out.
It rolls out to the Gemini Live API, Google Meet (5→70+ languages), and the Translate app.
All generated audio carries an imperceptible SynthID watermark for detectability.

Check out the Model Card and Technical details. Also, be happy to comply with us on Twitter and don’t neglect to affix our 150k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.

Need to companion with us for selling your GitHub Repo OR Hugging Face Page OR Product Release OR Webinar and so on.? Connect with us

The submit Google Releases Gemini 3.5 Live Translate, a Streaming Speech-to-Speech Audio Model Covering 70+ Languages Across Meet, Translate, and the Live API appeared first on MarkTechPost.

Google Releases Gemini 3.5 Live Translate, a Streaming Speech-to-Speech Audio Model Covering 70+ Languages Across Meet, Translate, and the Live API

Gemini 3.5 Live Translate

How the Continuous Streaming Works

Building With the Live API

Use Case

How It Changes Google Meet and Translate

Key Takeaways

Austin’s AI & tech landscape: How it’s evolved

6-figure secure AI solutions‍ that deliver 7-figure ROI

Unsloth vs Axolotl vs TRL vs LLaMA-Factory: A Fine-Tuning Framework Comparison on Speed, VRAM, and Multi-GPU

Building a Multi-Agent AI Research Team with LangGraph and Gemini for Automated Reporting

Building AI agents is 5% AI and 100% software engineering

Google AI Introduces the WebMCP to Enable Direct and Structured Website Interactions for New AI Agents

Curated by experts. Filtered for relevance.

Resources

About

Subscribe & learn more every day!

Gemini 3.5 Live Translate

How the Continuous Streaming Works

Building With the Live API

Use Case

How It Changes Google Meet and Translate

Key Takeaways

Similar Posts

Curated by experts. Filtered for relevance.

Resources

About

Subscribe & learn more every day!