elevenlabs
27 articles about elevenlabs in AI news
ElevenLabs Voice Cloning API Priced from $5 to $1,320/Month
ElevenLabs' AI voice cloning service has published pricing tiers from $5 to $1,320 per month. This formalizes the cost structure for developers and businesses integrating synthetic speech.
VoxCPM2 Open-Source Voice AI Outperforms ElevenLabs on Key Benchmarks
Researchers from OpenBMB and Tsinghua University released VoxCPM2, a 2B-parameter open-source voice AI that clones voices from short clips and creates voices from text descriptions. It outperforms ElevenLabs on the Minimax-MLS benchmark and runs locally with no API costs.
Neuralink & ElevenLabs Demo AI Voice Restoration for Brain Implant User
Neuralink and voice AI firm ElevenLabs demonstrated a system that generates speech for a Neuralink patient who lost their voice. The demo shows a brain-computer interface decoding intended speech into synthetic voice in real-time.
Mistral AI Launches Voxtral TTS: 3B-Parameter Open-Source Model Claims 63% Win Rate Over ElevenLabs Flash v2.5
Mistral AI released Voxtral TTS, a 3-billion-parameter open-weights text-to-speech model. It reportedly outperforms ElevenLabs Flash v2.5 in human preference tests, runs on 3 GB RAM, and clones voices from 5 seconds of audio.
ElevenLabs Unleashes 'Flows': The Unified AI Creative Suite That Could Revolutionize Content Production
ElevenLabs has launched Flows, a groundbreaking AI platform that seamlessly integrates image, video, voice, music, and sound effects generation into a single visual pipeline. This eliminates tool-switching and re-exporting, potentially transforming creative workflows.
Beyond Words: How ElevenLabs' Expressive Mode Is Teaching AI to Feel the Room
ElevenLabs has launched Expressive Mode for its ElevenAgents platform, enabling AI voice agents to dynamically adapt tone, emotion, and conversational timing based on context. This breakthrough in prosody and turn-taking aims to make AI interactions indistinguishable from human conversation.
Mistral AI Releases Voxtral TTS: 4B-Parameter Open-Weight Model Clones Voices from 3-Second Audio in 9 Languages
Mistral AI has launched Voxtral TTS, its first open-weight text-to-speech model. The 4B-parameter model clones voices from three seconds of reference audio across nine languages, with a latency of 70ms, and scored higher on naturalness than ElevenLabs Flash v2.5 in human tests.
LuxTTS Democratizes Voice Cloning: High-Quality Synthesis Now Runs on Consumer Hardware
LuxTTS, a new open-source text-to-speech model, enables realistic voice cloning from just 3 seconds of audio using only 1GB of VRAM. The system operates 150x faster than real-time and produces 48kHz audio, challenging proprietary solutions like ElevenLabs.
Omar Sar Uses Opus 4.7 Agent to Turn Podcasts into Self-Improving Wikis
AI researcher Omar Sar automated podcast consumption using an Opus 4.7 agent that extracts insights, generates analysis, and builds interactive HTML/JS artifacts. The system creates a self-improving knowledge wiki for agentic research workflows.
MiniMax AI Powers Wati's Astra Voice 2.0 for WhatsApp Business
MiniMax AI is providing its voice technology to power Wati's Astra Voice 2.0 platform, enabling businesses to deploy conversational voice AI on WhatsApp in multiple languages.
OpenVoice v2: Complete Voice Cloning Directory Launches on GitHub
A developer has compiled and released a comprehensive directory of open-source voice cloning tools and resources on GitHub. This centralizes access to models, datasets, and training code, lowering the barrier to entry for AI audio development.
Google Launches Gemini 3.1 Flash TTS with Prompt-Controlled Speech
Google has launched Gemini 3.1 Flash TTS, a text-to-speech model featuring prompt-based voice control and support for over 70 languages. This release expands Google's multimodal AI offerings directly to developers.
OpenBMB's VoxCPM 2: 2B-Param Open-Source TTS for Multilingual Voice
OpenBMB launched VoxCPM 2, a 2-billion-parameter open-source text-to-speech model. It generates multilingual, emotionally expressive speech from text descriptions and runs on consumer-grade hardware.
OpenMontage: Open-Source Agentic Video Production System Costs $0.69 Per Ad
OpenMontage, an open-source agentic video production system, has been released. It orchestrates 11 pipelines and 49 tools across multiple AI providers to autonomously script, generate assets, edit, and render videos from a plain language prompt.
Pika Labs Launches 'AI Self' Chatbot for Newsletter Creator Kimmonismus
Kimmonismus, who runs an AI newsletter with 225K+ readers, has launched a custom chatbot trained on his industry knowledge and opinions using Pika Labs' technology. The 'AI Self' is designed to handle reader inquiries at scale.
Laid-Off Engineer Open-Sources AI Job Search System 'career-ops'
A developer created 'career-ops'—an open-source AI job search system that evaluates job offers, generates tailored application materials, and filters opportunities. The tool uses Claude Code to process job descriptions against a user's CV and has gained 8.2k GitHub stars.
Text-to-Speech Cost Plummets from $0.15/Word to Free Local Models Using 3GB RAM
High-quality text-to-speech has shifted from a $0.15 per word cloud service to free, local models requiring only 3GB of RAM in 12 months, signaling a broader price collapse in AI inference.
Cohere Transcribe: 2B-Parameter Open-Source ASR Model Achieves 5.42% WER, Topping Hugging Face Leaderboard
Cohere released Transcribe, a 2B-parameter open-source speech recognition model. It claims a 5.42% average word error rate, beating OpenAI Whisper v3 and topping the Hugging Face Open ASR Leaderboard.
Prompt Master: Free, Open-Source Claude Skill Generates Optimized Prompts for 18+ AI Tools
A new, free, and open-source Claude skill called Prompt Master generates optimized prompts for over 18 AI tools—including ChatGPT, Midjourney, and Cursor—on the first attempt, aiming to reduce wasted credits and re-prompts.
Waves Audio Launches Lightning V3.1: 10-Second Voice Cloning with 44.1kHz Studio Quality
Waves Audio released Lightning V3.1, a voice cloning model that creates studio-quality voice replicas from just 10 seconds of audio with under 100ms latency. The update supports over 50 languages and targets real-time applications.
PodcastBrain: A Technical Breakdown of a Multi-Agent AI System That Learns User Preferences
A developer built PodcastBrain, an open-source, local AI podcast generator where two distinct agents debate any topic. The system learns user preferences via ratings and adjusts future content, demonstrating a working feedback loop with multi-agent orchestration.
Developer Builds AI Baby Monitor with Voice Cloning in Under 24 Hours Using DevKit
A developer created a working MVP of a smart baby monitor that clones a mother's voice to soothe a crying infant, completing the project in less than 24 hours after unboxing a new devkit.
Fish Audio S2 Enables Word-Level Speech Control with Positional Tags, Beats GPT-4o in Human Preference Tests
Fish Audio S2 introduces a 100% open-source TTS model that uses inline positional tags for word-level vocal control, achieving 8/10 wins against GPT-4o and Gemini in human preference tests while generating audio nearly 5x faster than real-time.
Tongyi Lab Releases World's First Open-Source Multi-Speaker AI Dubbing Model
Alibaba's Tongyi Lab has released the first open-source AI model capable of dubbing multi-speaker conversations, addressing one of the hardest problems in AI video generation. The model synchronizes voice with lip movements across multiple speakers in a single pass.
The Billion-Dollar Bet on AI World Models: How AMI's Funding Signals a New Era of Machine Understanding
AMI's $1 billion funding round for world model development highlights a strategic shift toward AI systems that understand physical reality. Meanwhile, robotics and creative AI tools see massive investments, with YouTube maintaining streaming dominance.
Modulate's Voice API Disrupts AI Transcription Market with 10-90x Cost Reduction
Startup Modulate has launched a voice transcription API that's 10-90x cheaper than established players like Deepgram and AssemblyAI. This dramatic price reduction could fundamentally reshape the economics of voice AI applications and make transcription technology accessible to a much broader market.
OpenAI's Audio Revolution: New Voice Models Signal Major AI Advancements
OpenAI appears poised to release new audio models that could significantly enhance voice interaction capabilities. This development follows recent trademark filings and suggests major improvements to voice mode technology.