voice cloning
26 articles about voice cloning in AI news
OpenVoice v2: Complete Voice Cloning Directory Launches on GitHub
A developer has compiled and released a comprehensive directory of open-source voice cloning tools and resources on GitHub. This centralizes access to models, datasets, and training code, lowering the barrier to entry for AI audio development.
ElevenLabs Voice Cloning API Priced from $5 to $1,320/Month
ElevenLabs' AI voice cloning service has published pricing tiers from $5 to $1,320 per month. This formalizes the cost structure for developers and businesses integrating synthetic speech.
Waves Audio Launches Lightning V3.1: 10-Second Voice Cloning with 44.1kHz Studio Quality
Waves Audio released Lightning V3.1, a voice cloning model that creates studio-quality voice replicas from just 10 seconds of audio with under 100ms latency. The update supports over 50 languages and targets real-time applications.
LuxTTS Democratizes Voice Cloning: High-Quality Synthesis Now Runs on Consumer Hardware
LuxTTS, a new open-source text-to-speech model, enables realistic voice cloning from just 3 seconds of audio using only 1GB of VRAM. The system operates 150x faster than real-time and produces 48kHz audio, challenging proprietary solutions like ElevenLabs.
Developer Builds AI Baby Monitor with Voice Cloning in Under 24 Hours Using DevKit
A developer created a working MVP of a smart baby monitor that clones a mother's voice to soothe a crying infant, completing the project in less than 24 hours after unboxing a new devkit.
VoxCPM2 Open-Source Voice AI Outperforms ElevenLabs on Key Benchmarks
Researchers from OpenBMB and Tsinghua University released VoxCPM2, a 2B-parameter open-source voice AI that clones voices from short clips and creates voices from text descriptions. It outperforms ElevenLabs on the Minimax-MLS benchmark and runs locally with no API costs.
GOLF.AI Launches 24/7 AI Concierge Agent for Golf Pro Shops, Voiced by Nick Faldo
GOLF.AI has introduced the GOLF.AI CONCIERGE Agent, an AI-powered voice assistant designed to serve as the primary contact for golf pro shops. It manages tee time bookings and answers customer queries around the clock, utilizing a licensed voice model of six-time major champion Sir Nick Faldo.
Neuralink & ElevenLabs Demo AI Voice Restoration for Brain Implant User
Neuralink and voice AI firm ElevenLabs demonstrated a system that generates speech for a Neuralink patient who lost their voice. The demo shows a brain-computer interface decoding intended speech into synthetic voice in real-time.
Mistral AI Releases Voxtral TTS: 4B-Parameter Open-Weight Model Clones Voices from 3-Second Audio in 9 Languages
Mistral AI has launched Voxtral TTS, its first open-weight text-to-speech model. The 4B-parameter model clones voices from three seconds of reference audio across nine languages, with a latency of 70ms, and scored higher on naturalness than ElevenLabs Flash v2.5 in human tests.
OpenAI's Audio Revolution: New Voice Models Signal Major AI Advancements
OpenAI appears poised to release new audio models that could significantly enhance voice interaction capabilities. This development follows recent trademark filings and suggests major improvements to voice mode technology.
OpenBMB's VoxCPM 2: 2B-Param Open-Source TTS for Multilingual Voice
OpenBMB launched VoxCPM 2, a 2-billion-parameter open-source text-to-speech model. It generates multilingual, emotionally expressive speech from text descriptions and runs on consumer-grade hardware.
Google Launches Gemini 3.1 Flash TTS with Prompt-Controlled Speech
Google has launched Gemini 3.1 Flash TTS, a text-to-speech model featuring prompt-based voice control and support for over 70 languages. This release expands Google's multimodal AI offerings directly to developers.
HeyGen Launches CLI Tool for AI Video Generation from Terminal
AI video platform HeyGen has launched a CLI tool, allowing users to generate videos with avatars, voice, and script via terminal commands. This moves video synthesis from a web dashboard into developer workflows.
Mistral AI Launches Voxtral TTS: 3B-Parameter Open-Source Model Claims 63% Win Rate Over ElevenLabs Flash v2.5
Mistral AI released Voxtral TTS, a 3-billion-parameter open-weights text-to-speech model. It reportedly outperforms ElevenLabs Flash v2.5 in human preference tests, runs on 3 GB RAM, and clones voices from 5 seconds of audio.
Tongyi Lab Releases World's First Open-Source Multi-Speaker AI Dubbing Model
Alibaba's Tongyi Lab has released the first open-source AI model capable of dubbing multi-speaker conversations, addressing one of the hardest problems in AI video generation. The model synchronizes voice with lip movements across multiple speakers in a single pass.
Miso One: 8B Open-Source TTS Hits 110ms Latency, Real Emotion
Miso One, an 8B open-source TTS model, achieves 110ms latency with emotional range. Weights are fully open-source for self-hosting, but no benchmark data is provided.
mlx-audio v0.4.3 Ships 6 New TTS Models, Slimmer Deps
mlx-audio v0.4.3 adds 6 TTS models, server concurrency, and slims dependencies, targeting Apple Silicon developers.
WeClone: Open-Source Tool Fine-Tunes AI Clones from Chat Logs
WeClone is an open-source tool that processes exported chat logs to fine-tune an LLM, creating a personalized AI clone. It has gained 16.4K stars on GitHub, offering a free, self-hosted alternative to commercial digital twin services.
Pika Labs Launches 'AI Self' Chatbot for Newsletter Creator Kimmonismus
Kimmonismus, who runs an AI newsletter with 225K+ readers, has launched a custom chatbot trained on his industry knowledge and opinions using Pika Labs' technology. The 'AI Self' is designed to handle reader inquiries at scale.
HeyGen Launches Avatar Engine, Open-Source Renderer & 175-Language Dubbing
HeyGen's major 2026 update includes a new avatar engine, an open-source video renderer, and 175-language dubbing capabilities, expanding its AI video generation platform for enterprise and creator use.
OpenBMB Launches VoxCPM 2, an Open-Source TTS Model Rivaling Qwen3-TTS
OpenBMB has launched VoxCPM 2, an open-source text-to-speech AI model from China. The release is positioned as a direct competitor to Alibaba's Qwen3-TTS, expanding the open-source TTS landscape.
Text-to-Speech Cost Plummets from $0.15/Word to Free Local Models Using 3GB RAM
High-quality text-to-speech has shifted from a $0.15 per word cloud service to free, local models requiring only 3GB of RAM in 12 months, signaling a broader price collapse in AI inference.
Fanvue Emerges as Primary Platform for AI-Generated Influencers, Explicitly Allowing Synthetic Creator Accounts
Fanvue, a subscription content platform, has positioned itself as the primary destination for AI-generated influencer accounts, explicitly permitting creators to monetize synthetic personas. This formalizes a niche market for AI-driven adult and influencer content.
POP.STORE Launches ECHO-ME: An Agentic AI Commerce Platform for Creators
POP.STORE announced ECHO-ME, an agentic AI platform designed to autonomously run a creator's business operations. It monitors social channels, detects brand deals, and converts fan interactions into revenue, launching with 15,000 creators. This represents a shift from task automation to full business operation for the solo creator economy.
Fish Audio S2 Enables Word-Level Speech Control with Positional Tags, Beats GPT-4o in Human Preference Tests
Fish Audio S2 introduces a 100% open-source TTS model that uses inline positional tags for word-level vocal control, achieving 8/10 wins against GPT-4o and Gemini in human preference tests while generating audio nearly 5x faster than real-time.
The Uncanny Valley of Truth: How AI Avatars Are Blurring Reality's Edge
AI avatars now replicate human speech patterns, facial expressions, and gestures with unsettling accuracy, creating synthetic personas indistinguishable from real people. This technological leap raises urgent questions about authenticity, trust, and the future of digital communication.