|

Google Releases Gemini 3.5 Live Translate, a Streaming Speech-to-Speech Audio Model Covering 70+ Languages Across Meet, Translate, and the Live API

Google simply introduced Gemini 3.5 Live Translate. It is their newest audio mannequin for dwell speech-to-speech translation. Speech-to-speech means spoken audio goes in, and translated spoken audio comes out. The mannequin detects over 70 languages mechanically and generates translated speech. It preserves the speaker’s intonation, pacing, and pitch in the output. Turn-by-turn methods await a speaker to complete earlier than responding. Gemini 3.5 Live Translate generates speech repeatedly as a substitute. It balances a trade-off between ready for context and translating instantly. More context improves high quality. Faster output retains the translation in sync with the speaker. The outcome stays a few seconds behind the speaker all through a session.

Gemini 3.5 Live Translate

Gemini 3.5 Live Translate is a single audio mannequin (gemini-3.5-live-translate-preview), not a chat assistant. It processes speech as the audio streams in, moderately than after a full sentence. It handles multilingual inputs with out manually configuring settings. Its noise robustness lets functions run in loud, unpredictable environments.

The mannequin is rolling out throughout three surfaces. Developers get it in public preview by the Gemini Live API and Google AI Studio. Enterprises get a non-public preview in Google Meet beginning this month. Everyone else will get it by the Google Translate app on Android and iOS.

How the Continuous Streaming Works

The design distinction issues for constructing real-time options. A conversational Live agent makes use of turn-based interactions. It depends on pauses, intent detection, and interruption dealing with. Live Translation makes use of steady stream processing as a substitute. It interprets as the speaker talks, with out ready for turns to finish.

To maintain strict real-time latency thresholds, the translation path accepts audio enter solely. Text enter isn’t supported in translation mode. The mannequin additionally drops device use and system directions on this mode. That retains it a centered translator pipeline moderately than a basic agent.

Building With the Live API

Developers configure translation inside the Live API session setup. You set a translationConfig block inside the generationConfig. The targetLanguageCode subject takes a BCP-47 code, reminiscent of "pl" or "es". BCP-47 is the commonplace format for language tags like en or pt-BR. It defaults to "en". The echoTargetLanguage boolean controls enter that’s already in the goal language. When true, the mannequin echoes that speech. When false, it stays silent. You can even allow enterAudioTranscription and outputAudioTranscription for textual content transcripts.

Audio codecs are mounted. Input is uncooked 16-bit PCM at 16kHz, mono, little-endian. Output is uncooked 16-bit PCM at 24kHz, mono, little-endian. PCM is uncompressed uncooked audio. You ship audio in chunks of 100ms. For client-side apps, ephemeral tokens on the v1alpha endpoint keep away from exposing your API key.

Dimension Live Agent Live Translation
Model function Assistant that listens, causes, and acts Interpreter / real-time translator pipeline
Interaction Turn-based, with interruption dealing with Continuous stream processing, no turns
Tools Function calling, Google Search, directions Translation solely, no instruments or directions
Inputs Text, audio, video, and picture Audio solely, for strict latency
Configuration Generation, speech, instruments, directions targetLanguageCode and echoTargetLanguage

Use Case

The mannequin targets dwell interpretation throughout a number of settings. Google lists multilingual calls, conferences, classes, and broadcasts. Developer platforms cut back the integration work for real-time media. Agora, Fishjam, LiveEquipment, Pipecat, and Vision Agents already use the Live API. These platforms deal with the advanced real-time media streaming infrastructure. That lets builders deal with the consumer expertise as a substitute.

Google’s instance app demonstrates dubbing and simultaneous multi-language translation. Grab is testing the mannequin for driver-and-traveler communication at pickups. Grab customers make over 10 million voice calls per thirty days. CJ ENM, LiveEquipment, and others reported optimistic suggestions on high quality, accuracy, and low latency.

How It Changes Google Meet and Translate

According to Google’s official launch, Google Meet will quickly use 3.5 Live Translate for speech translation. The desk reveals the acknowledged before-and-after for Meet.

Capability Previous Meet With 3.5 Live Translate
Languages 5 70+
Combinations per assembly Only to and from English 2000+ combos
Access Existing interface Updated interface for immediate entry

The Meet replace is in non-public preview for choose enterprise Workspace clients this month. A broader rollout follows later this 12 months. In the Translate app, the Live translate characteristic works with any linked headphones. It mirrors the speaker’s tone throughout 70+ languages. Android additionally features a listening mode. You maintain the telephone to your ear like a common name. The translated audio then streams by the earpiece, with out others listening to.

Key Takeaways

  • Gemini 3.5 Live Translate is Google’s newest audio mannequin for dwell speech-to-speech translation throughout 70+ languages.
  • It streams repeatedly as a substitute of turn-by-turn, staying a few seconds behind the speaker.
  • Developers can configure it by way of the Live API utilizing targetLanguageCode and echoTargetLanguage; audio-only, 16kHz in, 24kHz out.
  • It rolls out to the Gemini Live API, Google Meet (5→70+ languages), and the Translate app.
  • All generated audio carries an imperceptible SynthID watermark for detectability.


Check out the Model Card and Technical detailsAlso, be happy to comply with us on Twitter and don’t neglect to affix our 150k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.

Need to companion with us for selling your GitHub Repo OR Hugging Face Page OR Product Release OR Webinar and so on.? Connect with us

The submit Google Releases Gemini 3.5 Live Translate, a Streaming Speech-to-Speech Audio Model Covering 70+ Languages Across Meet, Translate, and the Live API appeared first on MarkTechPost.

Similar Posts