Google has introduced a new text-to-speech AI model dubbed Gemini 3.1 Flash TTS. According to the tech giant, the new model delivers improved controllability, expressivity and quality. Google also claims that Gemini 3.1 Flash TTS is its most natural and expressive model yet.
On the Artificial Analysis TTS leadboard, a benchmark that captures thousands of blind human preferences, the model achieved an Elo score of 1,211. Google says that Artificial Analysis has also positioned Gemini 3.1 Flash TTS within its ‘most attractive quadrant’ as the model balances performance with low cost.
One of the biggest upgrades in Gemini 3.1 Flash TTS is improved speech controllability. Users can guide how the AI speaks using natural language instructions. The model also introduces audio tags, which allow users to adjust vocal delivery more precisely. You can control speaking speed, pace and delivery. ‘By embedding natural language commands directly into the text input, you can steer AI-speech output with improved levels of granularity,’ Google said.
Another key feature is support for multi-speaker dialogue. Developers can create different characters with unique audio profiles. Gemini 3.1 Flash TTS also supports more than 70 languages. ‘Gemini 3.1 Flash TTS delivers high-fidelity speech and more precise control across more than 70 languages. These core optimisations bring advanced style, pacing and accent control to major markets,’ the tech giant said.
Also read: Google finally brings Gemini to Mac with dedicated app: All details
Note that all audio generated by Gemini 3.1 Flash TTS includes a SynthID watermark. This invisible watermark is embedded in the audio and helps detect AI-generated content.
Also read: Apple threatens to remove Elon Musk’s Grok from App Store, leaked letter reveals
Developers can access Gemini 3.1 Flash TTS in preview through the Gemini API and Google AI Studio. Enterprise users can use the model in preview through Vertex AI. Workspace users can access the new model via Google Vids.
Also read: Microsoft unveils MAI Image 2 Efficient AI model, calls it production workhorse: How to access