speech synthesis

28 articles about speech synthesis in AI news

Google Launches Gemini 3.1 Flash TTS with Prompt-Controlled Speech

Google has launched Gemini 3.1 Flash TTS, a text-to-speech model featuring prompt-based voice control and support for over 70 languages. This release expands Google's multimodal AI offerings directly to developers.

Apr 15, 202693% relevant

Microsoft Open-Sources VALL-E 2: A Zero-Shot TTS Model Achieving Human Parity in Speech Naturalness

Microsoft Research has open-sourced VALL-E 2, a neural codec language model for text-to-speech that achieves human parity in naturalness. It uses a novel 'Repetition-Aware Sampling' method to eliminate word repetition, a common failure mode in prior models.

Mar 30, 202695% relevant

Text-to-Speech Cost Plummets from $0.15/Word to Free Local Models Using 3GB RAM

High-quality text-to-speech has shifted from a $0.15 per word cloud service to free, local models requiring only 3GB of RAM in 12 months, signaling a broader price collapse in AI inference.

Mar 30, 202685% relevant

LuxTTS Democratizes Voice Cloning: High-Quality Synthesis Now Runs on Consumer Hardware

LuxTTS, a new open-source text-to-speech model, enables realistic voice cloning from just 3 seconds of audio using only 1GB of VRAM. The system operates 150x faster than real-time and produces 48kHz audio, challenging proprietary solutions like ElevenLabs.

Mar 11, 202695% relevant

OpenBMB's VoxCPM 2: 2B-Param Open-Source TTS for Multilingual Voice

OpenBMB launched VoxCPM 2, a 2-billion-parameter open-source text-to-speech model. It generates multilingual, emotionally expressive speech from text descriptions and runs on consumer-grade hardware.

Apr 13, 202697% relevant

ElevenLabs Voice Cloning API Priced from $5 to $1,320/Month

ElevenLabs' AI voice cloning service has published pricing tiers from $5 to $1,320 per month. This formalizes the cost structure for developers and businesses integrating synthetic speech.

Apr 10, 202687% relevant

OpenBMB Launches VoxCPM 2, an Open-Source TTS Model Rivaling Qwen3-TTS

OpenBMB has launched VoxCPM 2, an open-source text-to-speech AI model from China. The release is positioned as a direct competitor to Alibaba's Qwen3-TTS, expanding the open-source TTS landscape.

Apr 7, 202691% relevant

Neuralink & ElevenLabs Demo AI Voice Restoration for Brain Implant User

Neuralink and voice AI firm ElevenLabs demonstrated a system that generates speech for a Neuralink patient who lost their voice. The demo shows a brain-computer interface decoding intended speech into synthetic voice in real-time.

Apr 6, 202685% relevant

X Post Reveals Audible Quality Differences in GPU vs. NPU AI Inference

A developer demonstrated audible quality differences in AI text-to-speech output when run on GPU, CPU, and NPU hardware, highlighting a key efficiency vs. fidelity trade-off for on-device AI.

Apr 5, 202675% relevant

Mistral AI Releases Voxtral TTS: 4B-Parameter Open-Weight Model Clones Voices from 3-Second Audio in 9 Languages

Mistral AI has launched Voxtral TTS, its first open-weight text-to-speech model. The 4B-parameter model clones voices from three seconds of reference audio across nine languages, with a latency of 70ms, and scored higher on naturalness than ElevenLabs Flash v2.5 in human tests.

Mar 26, 202695% relevant

OpenClaw Voice Interface Demo Shows Real-Time AI Assistant with Push-to-Talk Hardware

A developer demonstrated a custom hardware rig that uses a push-to-talk button to transcribe speech, query the OpenClaw AI model, and stream responses back in real-time. The setup provides a tangible, hands-free interface for interacting with open-source AI assistants.

Mar 23, 202685% relevant

Whisper's Real-Time Translation Demo Shows Practical Progress Toward Universal Translation

OpenAI's Whisper model demonstrated real-time translation from English to Spanish, showcasing progress toward practical universal translation tools. The demo highlights incremental but meaningful improvements in speech-to-speech translation latency and quality.

Mar 18, 202685% relevant

OmniForcing Enables Real-Time Joint Audio-Visual Generation at 25 FPS with 0.7s Latency

Researchers introduced OmniForcing, a method that distills a bidirectional LTX-2 model into a causal streaming generator for joint audio-visual synthesis. It achieves ~25 FPS with 0.7s latency, a 35× speedup over offline diffusion models while maintaining multi-modal fidelity.

Mar 16, 202692% relevant

The Uncanny Valley of Truth: How AI Avatars Are Blurring Reality's Edge

AI avatars now replicate human speech patterns, facial expressions, and gestures with unsettling accuracy, creating synthetic personas indistinguishable from real people. This technological leap raises urgent questions about authenticity, trust, and the future of digital communication.

Feb 27, 202685% relevant

Gemini 3.5 Live Translate Debuts as Real-Time Audio Model

Google DeepMind released Gemini 3.5 Live Translate, an audio model for real-time translation, but disclosed no pricing, latency, or language pair details.

Jun 9, 202687% relevant

MiniMax Music-2.6 Goes Free on Cloudflare This Week

MiniMax's Music-2.6 AI model is available for free on Cloudflare's platform this week, allowing users to generate full-length songs or instrumentals from text prompts.

Apr 27, 202675% relevant

GPT-5.5 Generates Complex SVG in Single Prompt, User Reports

A developer shared that OpenAI's GPT-5.5 produced a sophisticated SVG image from a single prompt. This suggests improvements in the model's ability to generate precise, structured visual code.

Apr 19, 202685% relevant

MiniMax AI Powers Wati's Astra Voice 2.0 for WhatsApp Business

MiniMax AI is providing its voice technology to power Wati's Astra Voice 2.0 platform, enabling businesses to deploy conversational voice AI on WhatsApp in multiple languages.

Apr 16, 202685% relevant

OpenVoice v2: Complete Voice Cloning Directory Launches on GitHub

A developer has compiled and released a comprehensive directory of open-source voice cloning tools and resources on GitHub. This centralizes access to models, datasets, and training code, lowering the barrier to entry for AI audio development.

Apr 16, 202685% relevant

VoxCPM2 Open-Source Voice AI Outperforms ElevenLabs on Key Benchmarks

Researchers from OpenBMB and Tsinghua University released VoxCPM2, a 2B-parameter open-source voice AI that clones voices from short clips and creates voices from text descriptions. It outperforms ElevenLabs on the Minimax-MLS benchmark and runs locally with no API costs.

Apr 10, 202695% relevant

Building a Memory Layer for a Voice AI Agent: A Developer's Blueprint

A developer shares a technical case study on building a voice-first journal app, focusing on the critical memory layer. The article details using Redis Agent Memory Server for working/long-term memory and key latency optimizations like streaming APIs and parallel fetches to meet voice's strict responsiveness demands.

Apr 4, 202676% relevant

PodcastBrain: A Technical Breakdown of a Multi-Agent AI System That Learns User Preferences

A developer built PodcastBrain, an open-source, local AI podcast generator where two distinct agents debate any topic. The system learns user preferences via ratings and adjusts future content, demonstrating a working feedback loop with multi-agent orchestration.

Mar 23, 202670% relevant

Modulate's Voice API Disrupts AI Transcription Market with 10-90x Cost Reduction

Startup Modulate has launched a voice transcription API that's 10-90x cheaper than established players like Deepgram and AssemblyAI. This dramatic price reduction could fundamentally reshape the economics of voice AI applications and make transcription technology accessible to a much broader market.

Mar 12, 202695% relevant

AI Phone Assistants Reach New Milestone: Autonomous Call-Handling Goes Mainstream

A new AI system can now answer phone calls autonomously, moving beyond chatbots to handle real-time conversations. This development represents a significant leap in voice AI capabilities and practical automation.

Mar 11, 202687% relevant

RunAnywhere's MetalRT Engine Delivers Breakthrough AI Performance on Apple Silicon

RunAnywhere has launched MetalRT, a proprietary GPU inference engine that dramatically accelerates on-device AI workloads on Apple Silicon. Their open-source RCLI tool demonstrates sub-200ms voice AI pipelines, outperforming existing solutions like llama.cpp and Apple's MLX.

Mar 10, 202680% relevant

AI Research Breakthroughs: From Video Reasoning to Self-Stopping Models

This week's top AI papers reveal major advances in video understanding, reasoning efficiency, and agent training. Researchers introduced a massive video reasoning dataset, models that know when to stop thinking, and techniques for improving AI agents without full retraining.

Mar 1, 202695% relevant

OpenAI's WebSocket Revolution: The End of AI Voice Lag and What It Means for Human-Computer Interaction

OpenAI has introduced WebSocket mode for its API, dramatically reducing latency in voice AI interactions. This technical breakthrough enables near-real-time conversations by eliminating the sequential processing bottlenecks that plagued previous voice AI systems.

Feb 23, 202675% relevant

Meta's Digital Afterlife: AI That Inherits Your Social Media Identity

Meta has patented technology allowing AI to assume control of deceased users' accounts, continuing to post and interact as if they were still alive. This raises profound questions about digital legacy, consent, and the nature of memory in the AI age.

Feb 16, 202685% relevant

Explore More

AI Agents Large Language Models Claude Code OpenAI RAG MCP Fine-tuning Benchmarks Open Source AI AI Safety