voice ai
30 articles about voice ai in AI news
VoxCPM2 Open-Source Voice AI Outperforms ElevenLabs on Key Benchmarks
Researchers from OpenBMB and Tsinghua University released VoxCPM2, a 2B-parameter open-source voice AI that clones voices from short clips and creates voices from text descriptions. It outperforms ElevenLabs on the Minimax-MLS benchmark and runs locally with no API costs.
Alibaba's Qwen 3.5 Omni Targets Western Market with Advanced Voice AI and Strategic Messaging
Alibaba's Qwen 3.5 Omni model features a robust voice AI that handles interruptions naturally, while its launch presentation signals a direct push to compete in Western markets as a cost-effective alternative.
Building a Memory Layer for a Voice AI Agent: A Developer's Blueprint
A developer shares a technical case study on building a voice-first journal app, focusing on the critical memory layer. The article details using Redis Agent Memory Server for working/long-term memory and key latency optimizations like streaming APIs and parallel fetches to meet voice's strict responsiveness demands.
Neuralink & ElevenLabs Demo AI Voice Restoration for Brain Implant User
Neuralink and voice AI firm ElevenLabs demonstrated a system that generates speech for a Neuralink patient who lost their voice. The demo shows a brain-computer interface decoding intended speech into synthetic voice in real-time.
Modulate's Voice API Disrupts AI Transcription Market with 10-90x Cost Reduction
Startup Modulate has launched a voice transcription API that's 10-90x cheaper than established players like Deepgram and AssemblyAI. This dramatic price reduction could fundamentally reshape the economics of voice AI applications and make transcription technology accessible to a much broader market.
OpenAI's WebSocket Revolution: The End of AI Voice Lag and What It Means for Human-Computer Interaction
OpenAI has introduced WebSocket mode for its API, dramatically reducing latency in voice AI interactions. This technical breakthrough enables near-real-time conversations by eliminating the sequential processing bottlenecks that plagued previous voice AI systems.
AI Phone Assistants Reach New Milestone: Autonomous Call-Handling Goes Mainstream
A new AI system can now answer phone calls autonomously, moving beyond chatbots to handle real-time conversations. This development represents a significant leap in voice AI capabilities and practical automation.
RunAnywhere's MetalRT Engine Delivers Breakthrough AI Performance on Apple Silicon
RunAnywhere has launched MetalRT, a proprietary GPU inference engine that dramatically accelerates on-device AI workloads on Apple Silicon. Their open-source RCLI tool demonstrates sub-200ms voice AI pipelines, outperforming existing solutions like llama.cpp and Apple's MLX.
OpenClaw Voice Interface Demo Shows Real-Time AI Assistant Hardware
A developer showcased a custom hardware rig that integrates a push-button voice interface with the OpenClaw AI model, streaming responses in real-time. This demonstrates a tangible, open-source alternative to proprietary voice assistants like Amazon Alexa.
TaxHacker: Open-Source AI Accounting App for Self-Hosted Receipt & Invoice Parsing
TaxHacker is a 100% open-source AI accounting application that users can self-host to automatically extract data from financial documents. It processes receipts, invoices, and PDFs in any language or currency, storing the structured data locally without sending it to external servers.
GOLF.AI Launches 24/7 AI Concierge Agent for Golf Pro Shops, Voiced by Nick Faldo
GOLF.AI has introduced the GOLF.AI CONCIERGE Agent, an AI-powered voice assistant designed to serve as the primary contact for golf pro shops. It manages tee time bookings and answers customer queries around the clock, utilizing a licensed voice model of six-time major champion Sir Nick Faldo.
FDA-Designated AI 'Vox' Detects Heart Failure from 5-Second Voice Clip
An AI tool named Vox can detect signs of worsening heart failure from a 5-second patient voice clip. It's trained on >3M voice samples and backed by five clinical trials, targeting a condition affecting 64M people globally.
Microsoft Expands AI Portfolio with New Speech and Voice Models
Microsoft has released MAI-Transcribe-1, a new speech-to-text model, and made its in-house MAI-Voice-1 and MAI-Image-2 models available. This expansion represents Microsoft's continued diversification beyond its OpenAI partnership, strengthening its position in the competitive AI market.
Typeless Launches AI Voice-to-Text Tool Claiming 4x Speed Boost Over Typing
Typeless, a new AI tool, converts spoken voice into polished, formatted text directly within any application. The company claims it operates 4x faster than manual typing.
Clawdbot AI Agent Autonomously Transcribes & Replies to Voice Messages Using Whisper API
A user demonstrated Clawdbot, an AI agent, autonomously handling a voice message: detecting its Opus format, converting it via FFmpeg, calling OpenAI's Whisper API for transcription, and generating a text reply. This showcases emerging agentic workflow automation without explicit voice feature support.
GOLF.AI Launches 24/7 AI Concierge Agent for Pro Shop Bookings, Voiced by Nick Faldo
GOLF.AI has launched a 24/7 AI agent that handles tee time bookings and Q&A for golf pro shops, featuring a voice interface modeled after Sir Nick Faldo. This represents a direct application of AI agents in a high-touch, appointment-driven retail environment.
Mistral AI Releases Voxtral TTS: 4B-Parameter Open-Weight Model Clones Voices from 3-Second Audio in 9 Languages
Mistral AI has launched Voxtral TTS, its first open-weight text-to-speech model. The 4B-parameter model clones voices from three seconds of reference audio across nine languages, with a latency of 70ms, and scored higher on naturalness than ElevenLabs Flash v2.5 in human tests.
Open-Source 'Manus Alternative' Emerges: Fully Local AI Agent with Web Browsing, Code Execution, and Voice Input
An open-source project has been released that replicates core features of AI agent platforms like Manus—autonomous web browsing, multi-language code execution, and voice input—while running entirely locally on user hardware with no external API dependencies.
TaxHacker: Open-Source, Self-Hosted AI App Automates Receipt and Invoice Processing
A developer released TaxHacker, a self-hosted AI accounting app that extracts data from receipts/invoices in any language, converts currencies, and exports to CSV. It's fully open-source under MIT license and runs via Docker.
Salesforce Launches Agentforce Contact Center, Unifying AI Agents, Voice, and CRM
Salesforce introduces Agentforce Contact Center, a native platform integrating voice, digital channels, CRM data, and autonomous AI agents. It aims to solve integration complexity and improve AI-human collaboration for customer service.
LLM-as-a-Judge: A Practical Framework for Evaluating AI-Extracted Invoice Data
A technical guide demonstrating how to use LLMs as evaluators to assess the accuracy of AI-extracted invoice data, replacing manual checks and brittle validation rules with scalable, structured assessment.
OpenAI Teases Major Platform Evolution with New Voice and Multimodal Capabilities
OpenAI appears to be preparing significant upgrades to its AI platform, with hints pointing toward enhanced voice interaction capabilities and new multimodal features that could transform how users engage with artificial intelligence.
Anthropic's Claude Code Gets Voice Mode: The Next Frontier in AI-Assisted Programming
Anthropic has introduced voice mode for Claude Code, allowing developers to interact with the AI coding assistant through natural speech. This marks a significant evolution in how programmers can collaborate with AI tools, potentially transforming development workflows.
Typeless AI Redefines Voice-to-Text: From Transcription to Native-Level Rewriting
Typeless AI has introduced a revolutionary voice-to-text tool that doesn't just transcribe speech but rewrites it with native-level fluency, grammar correction, and tone adjustment across multiple languages, potentially eliminating manual typing for many professional tasks.
Voice-First AI Writing: The Silent Revolution Transforming How We Create
AI-powered voice dictation is evolving from a convenience tool to a core workflow, enabling real-time thought capture at speaking speed. This shift promises to fundamentally change how professionals write, edit, and create content.
OpenAI's Audio Revolution: New Voice Models Signal Major AI Advancements
OpenAI appears poised to release new audio models that could significantly enhance voice interaction capabilities. This development follows recent trademark filings and suggests major improvements to voice mode technology.
Enterprise AI Goes Mainstream: How Major Corporations Are Scaling Operations with Intelligent Voice Systems
Major corporations including FedEx, Marriott, and Volkswagen are deploying advanced AI voice systems to handle millions of customer interactions, enabling instant scalability during peak demand periods without traditional hiring constraints.
Developer Builds AI Baby Monitor with Voice Cloning in Under 24 Hours Using DevKit
A developer created a working MVP of a smart baby monitor that clones a mother's voice to soothe a crying infant, completing the project in less than 24 hours after unboxing a new devkit.
ElevenLabs Voice Cloning API Priced from $5 to $1,320/Month
ElevenLabs' AI voice cloning service has published pricing tiers from $5 to $1,320 per month. This formalizes the cost structure for developers and businesses integrating synthetic speech.
Microsoft's VibeVoice Family Processes 60-Minute Audio in Single Pass, Eliminates Chunking for ASR & TTS
Microsoft open-sourced VibeVoice, a family of speech AI models that processes up to 60 minutes of audio without chunking. It delivers structured transcriptions with speaker diarization and generates 90-minute multi-speaker speech in one pass.