voice synthesis
30 articles about voice synthesis in AI news
LuxTTS Democratizes Voice Cloning: High-Quality Synthesis Now Runs on Consumer Hardware
LuxTTS, a new open-source text-to-speech model, enables realistic voice cloning from just 3 seconds of audio using only 1GB of VRAM. The system operates 150x faster than real-time and produces 48kHz audio, challenging proprietary solutions like ElevenLabs.
MiniMax AI Powers Wati's Astra Voice 2.0 for WhatsApp Business
MiniMax AI is providing its voice technology to power Wati's Astra Voice 2.0 platform, enabling businesses to deploy conversational voice AI on WhatsApp in multiple languages.
OpenVoice v2: Complete Voice Cloning Directory Launches on GitHub
A developer has compiled and released a comprehensive directory of open-source voice cloning tools and resources on GitHub. This centralizes access to models, datasets, and training code, lowering the barrier to entry for AI audio development.
ElevenLabs Voice Cloning API Priced from $5 to $1,320/Month
ElevenLabs' AI voice cloning service has published pricing tiers from $5 to $1,320 per month. This formalizes the cost structure for developers and businesses integrating synthetic speech.
VoxCPM2 Open-Source Voice AI Outperforms ElevenLabs on Key Benchmarks
Researchers from OpenBMB and Tsinghua University released VoxCPM2, a 2B-parameter open-source voice AI that clones voices from short clips and creates voices from text descriptions. It outperforms ElevenLabs on the Minimax-MLS benchmark and runs locally with no API costs.
Neuralink & ElevenLabs Demo AI Voice Restoration for Brain Implant User
Neuralink and voice AI firm ElevenLabs demonstrated a system that generates speech for a Neuralink patient who lost their voice. The demo shows a brain-computer interface decoding intended speech into synthetic voice in real-time.
Building a Memory Layer for a Voice AI Agent: A Developer's Blueprint
A developer shares a technical case study on building a voice-first journal app, focusing on the critical memory layer. The article details using Redis Agent Memory Server for working/long-term memory and key latency optimizations like streaming APIs and parallel fetches to meet voice's strict responsiveness demands.
GOLF.AI Launches 24/7 AI Concierge Agent for Pro Shop Bookings, Voiced by Nick Faldo
GOLF.AI has launched a 24/7 AI agent that handles tee time bookings and Q&A for golf pro shops, featuring a voice interface modeled after Sir Nick Faldo. This represents a direct application of AI agents in a high-touch, appointment-driven retail environment.
Mistral AI Releases Voxtral TTS: 4B-Parameter Open-Weight Model Clones Voices from 3-Second Audio in 9 Languages
Mistral AI has launched Voxtral TTS, its first open-weight text-to-speech model. The 4B-parameter model clones voices from three seconds of reference audio across nine languages, with a latency of 70ms, and scored higher on naturalness than ElevenLabs Flash v2.5 in human tests.
Waves Audio Launches Lightning V3.1: 10-Second Voice Cloning with 44.1kHz Studio Quality
Waves Audio released Lightning V3.1, a voice cloning model that creates studio-quality voice replicas from just 10 seconds of audio with under 100ms latency. The update supports over 50 languages and targets real-time applications.
Developer Builds AI Baby Monitor with Voice Cloning in Under 24 Hours Using DevKit
A developer created a working MVP of a smart baby monitor that clones a mother's voice to soothe a crying infant, completing the project in less than 24 hours after unboxing a new devkit.
Modulate's Voice API Disrupts AI Transcription Market with 10-90x Cost Reduction
Startup Modulate has launched a voice transcription API that's 10-90x cheaper than established players like Deepgram and AssemblyAI. This dramatic price reduction could fundamentally reshape the economics of voice AI applications and make transcription technology accessible to a much broader market.
OpenAI's WebSocket Revolution: The End of AI Voice Lag and What It Means for Human-Computer Interaction
OpenAI has introduced WebSocket mode for its API, dramatically reducing latency in voice AI interactions. This technical breakthrough enables near-real-time conversations by eliminating the sequential processing bottlenecks that plagued previous voice AI systems.
Kyutai Labs Releases OVIE: Single-Image Novel View Synthesis Model
French AI lab Kyutai Labs released OVIE, a novel view generation model trained only on single images, bypassing the need for costly multi-view datasets. This could democratize 3D content creation from 2D photos.
OpenBMB's VoxCPM 2: 2B-Param Open-Source TTS for Multilingual Voice
OpenBMB launched VoxCPM 2, a 2-billion-parameter open-source text-to-speech model. It generates multilingual, emotionally expressive speech from text descriptions and runs on consumer-grade hardware.
OpenClaw Voice Interface Demo Shows Real-Time AI Assistant with Push-to-Talk Hardware
A developer demonstrated a custom hardware rig that uses a push-to-talk button to transcribe speech, query the OpenClaw AI model, and stream responses back in real-time. The setup provides a tangible, hands-free interface for interacting with open-source AI assistants.
HeyGen Launches CLI Tool for AI Video Generation from Terminal
AI video platform HeyGen has launched a CLI tool, allowing users to generate videos with avatars, voice, and script via terminal commands. This moves video synthesis from a web dashboard into developer workflows.
Google Launches Gemini 3.1 Flash TTS with Prompt-Controlled Speech
Google has launched Gemini 3.1 Flash TTS, a text-to-speech model featuring prompt-based voice control and support for over 70 languages. This release expands Google's multimodal AI offerings directly to developers.
ElevenLabs Unleashes 'Flows': The Unified AI Creative Suite That Could Revolutionize Content Production
ElevenLabs has launched Flows, a groundbreaking AI platform that seamlessly integrates image, video, voice, music, and sound effects generation into a single visual pipeline. This eliminates tool-switching and re-exporting, potentially transforming creative workflows.
AI Phone Assistants Reach New Milestone: Autonomous Call-Handling Goes Mainstream
A new AI system can now answer phone calls autonomously, moving beyond chatbots to handle real-time conversations. This development represents a significant leap in voice AI capabilities and practical automation.
RunAnywhere's MetalRT Engine Delivers Breakthrough AI Performance on Apple Silicon
RunAnywhere has launched MetalRT, a proprietary GPU inference engine that dramatically accelerates on-device AI workloads on Apple Silicon. Their open-source RCLI tool demonstrates sub-200ms voice AI pipelines, outperforming existing solutions like llama.cpp and Apple's MLX.
EXCLUSIVE Q&A: Bain & Co. Analyzes Next-Gen AI in Retail Marketing
Consulting giant Bain & Company provides expert analysis on the evolution of AI in retail marketing, detailing how next-generation generative AI is shifting from operational efficiency to driving personalized engagement and growth.
ByteDance's OmniShow Unifies Text, Image, Audio, Pose for Video Gen
ByteDance introduced OmniShow, a unified multimodal framework for video generation that accepts text, reference images, audio, and pose inputs simultaneously. It claims state-of-the-art performance across diverse conditioning settings.
AI Reshapes Luxury Travel—But Human Expertise Remains Essential
A new report highlights how AI is being integrated into luxury travel for personalized itineraries, predictive service, and backend operations. However, the consensus is that AI should augment, not replace, the human expertise and emotional intelligence that define true luxury service.
Pioneer Agent: A Closed-Loop System for Automating Small Language Model
Researchers present Pioneer Agent, a system that automates the adaptation of small language models to specific tasks. It handles data curation, failure diagnosis, and iterative training, showing significant performance gains in benchmarks and production-style deployments. This addresses a major engineering bottleneck for deploying efficient, specialized AI.
A Practical Guide to Fine-Tuning an LLM on RunPod H100 GPUs with QLoRA
The source is a technical tutorial on using QLoRA for parameter-efficient fine-tuning of an LLM, leveraging RunPod's cloud H100 GPUs. It focuses on the practical setup and execution steps for engineers.
Game Studios Show Wide Variance in AI Adoption, Wharton Report Finds
A Wharton School report, based on interviews at 20 game studios, finds a wide spectrum of organizational approaches to adopting generative AI tools, from aggressive integration to active resistance.
OpenBMB Launches VoxCPM 2, an Open-Source TTS Model Rivaling Qwen3-TTS
OpenBMB has launched VoxCPM 2, an open-source text-to-speech AI model from China. The release is positioned as a direct competitor to Alibaba's Qwen3-TTS, expanding the open-source TTS landscape.
CRM Platforms Are Evolving into AI Agent Hubs
The article reports a strategic shift where CRM systems like Salesforce and HubSpot are becoming platforms for deploying and managing AI agents. This evolution enables automated, multi-step customer interactions directly within the customer data environment.
X Post Reveals Audible Quality Differences in GPU vs. NPU AI Inference
A developer demonstrated audible quality differences in AI text-to-speech output when run on GPU, CPU, and NPU hardware, highlighting a key efficiency vs. fidelity trade-off for on-device AI.