elevenlabs
17 articles about elevenlabs in AI news
Neuralink & ElevenLabs Demo AI Voice Restoration for Brain Implant User
Neuralink and voice AI firm ElevenLabs demonstrated a system that generates speech for a Neuralink patient who lost their voice. The demo shows a brain-computer interface decoding intended speech into synthetic voice in real-time.
Mistral AI Launches Voxtral TTS: 3B-Parameter Open-Source Model Claims 63% Win Rate Over ElevenLabs Flash v2.5
Mistral AI released Voxtral TTS, a 3-billion-parameter open-weights text-to-speech model. It reportedly outperforms ElevenLabs Flash v2.5 in human preference tests, runs on 3 GB RAM, and clones voices from 5 seconds of audio.
ElevenLabs Unleashes 'Flows': The Unified AI Creative Suite That Could Revolutionize Content Production
ElevenLabs has launched Flows, a groundbreaking AI platform that seamlessly integrates image, video, voice, music, and sound effects generation into a single visual pipeline. This eliminates tool-switching and re-exporting, potentially transforming creative workflows.
Beyond Words: How ElevenLabs' Expressive Mode Is Teaching AI to Feel the Room
ElevenLabs has launched Expressive Mode for its ElevenAgents platform, enabling AI voice agents to dynamically adapt tone, emotion, and conversational timing based on context. This breakthrough in prosody and turn-taking aims to make AI interactions indistinguishable from human conversation.
Mistral AI Releases Voxtral TTS: 4B-Parameter Open-Weight Model Clones Voices from 3-Second Audio in 9 Languages
Mistral AI has launched Voxtral TTS, its first open-weight text-to-speech model. The 4B-parameter model clones voices from three seconds of reference audio across nine languages, with a latency of 70ms, and scored higher on naturalness than ElevenLabs Flash v2.5 in human tests.
LuxTTS Democratizes Voice Cloning: High-Quality Synthesis Now Runs on Consumer Hardware
LuxTTS, a new open-source text-to-speech model, enables realistic voice cloning from just 3 seconds of audio using only 1GB of VRAM. The system operates 150x faster than real-time and produces 48kHz audio, challenging proprietary solutions like ElevenLabs.
Text-to-Speech Cost Plummets from $0.15/Word to Free Local Models Using 3GB RAM
High-quality text-to-speech has shifted from a $0.15 per word cloud service to free, local models requiring only 3GB of RAM in 12 months, signaling a broader price collapse in AI inference.
Cohere Transcribe: 2B-Parameter Open-Source ASR Model Achieves 5.42% WER, Topping Hugging Face Leaderboard
Cohere released Transcribe, a 2B-parameter open-source speech recognition model. It claims a 5.42% average word error rate, beating OpenAI Whisper v3 and topping the Hugging Face Open ASR Leaderboard.
Prompt Master: Free, Open-Source Claude Skill Generates Optimized Prompts for 18+ AI Tools
A new, free, and open-source Claude skill called Prompt Master generates optimized prompts for over 18 AI tools—including ChatGPT, Midjourney, and Cursor—on the first attempt, aiming to reduce wasted credits and re-prompts.
Waves Audio Launches Lightning V3.1: 10-Second Voice Cloning with 44.1kHz Studio Quality
Waves Audio released Lightning V3.1, a voice cloning model that creates studio-quality voice replicas from just 10 seconds of audio with under 100ms latency. The update supports over 50 languages and targets real-time applications.
PodcastBrain: A Technical Breakdown of a Multi-Agent AI System That Learns User Preferences
A developer built PodcastBrain, an open-source, local AI podcast generator where two distinct agents debate any topic. The system learns user preferences via ratings and adjusts future content, demonstrating a working feedback loop with multi-agent orchestration.
Developer Builds AI Baby Monitor with Voice Cloning in Under 24 Hours Using DevKit
A developer created a working MVP of a smart baby monitor that clones a mother's voice to soothe a crying infant, completing the project in less than 24 hours after unboxing a new devkit.
Fish Audio S2 Enables Word-Level Speech Control with Positional Tags, Beats GPT-4o in Human Preference Tests
Fish Audio S2 introduces a 100% open-source TTS model that uses inline positional tags for word-level vocal control, achieving 8/10 wins against GPT-4o and Gemini in human preference tests while generating audio nearly 5x faster than real-time.
Tongyi Lab Releases World's First Open-Source Multi-Speaker AI Dubbing Model
Alibaba's Tongyi Lab has released the first open-source AI model capable of dubbing multi-speaker conversations, addressing one of the hardest problems in AI video generation. The model synchronizes voice with lip movements across multiple speakers in a single pass.
The Billion-Dollar Bet on AI World Models: How AMI's Funding Signals a New Era of Machine Understanding
AMI's $1 billion funding round for world model development highlights a strategic shift toward AI systems that understand physical reality. Meanwhile, robotics and creative AI tools see massive investments, with YouTube maintaining streaming dominance.
Modulate's Voice API Disrupts AI Transcription Market with 10-90x Cost Reduction
Startup Modulate has launched a voice transcription API that's 10-90x cheaper than established players like Deepgram and AssemblyAI. This dramatic price reduction could fundamentally reshape the economics of voice AI applications and make transcription technology accessible to a much broader market.
OpenAI's Audio Revolution: New Voice Models Signal Major AI Advancements
OpenAI appears poised to release new audio models that could significantly enhance voice interaction capabilities. This development follows recent trademark filings and suggests major improvements to voice mode technology.