whisper
30 articles about whisper in AI news
Clawdbot AI Agent Autonomously Transcribes & Replies to Voice Messages Using Whisper API
A user demonstrated Clawdbot, an AI agent, autonomously handling a voice message: detecting its Opus format, converting it via FFmpeg, calling OpenAI's Whisper API for transcription, and generating a text reply. This showcases emerging agentic workflow automation without explicit voice feature support.
Insanely Fast Whisper CLI Transcribes 2.5 Hours of Audio in 98 Seconds with Flash Attention 2
A new open-source CLI tool called Insanely Fast Whisper achieves 19x speedup over standard Whisper large-v3, transcribing 150 minutes of audio in 98 seconds using Flash Attention 2 and batching with no quality loss.
Whisper's Real-Time Translation Demo Shows Practical Progress Toward Universal Translation
OpenAI's Whisper model demonstrated real-time translation from English to Spanish, showcasing progress toward practical universal translation tools. The demo highlights incremental but meaningful improvements in speech-to-speech translation latency and quality.
Cohere Transcribe: 2B-Parameter Open-Source ASR Model Achieves 5.42% WER, Topping Hugging Face Leaderboard
Cohere released Transcribe, a 2B-parameter open-source speech recognition model. It claims a 5.42% average word error rate, beating OpenAI Whisper v3 and topping the Hugging Face Open ASR Leaderboard.
GPT-5.2-Based Smart Speaker Achieves 100% Resident ID Accuracy in Care Home Safety Evaluation
Researchers evaluated a voice-enabled smart speaker for care homes using Whisper and RAG, achieving 100% resident identification and 89.09% reminder recognition with GPT-5.2. The safety-focused framework highlights remaining challenges in converting informal speech to calendar events (84.65% accuracy).
Cursor SDK Turns AI Agent Runtime into Programmable Infrastructure
Cursor is releasing an SDK that turns its agent runtime into programmable infrastructure for headless use in CI/CD pipelines, internal tools, and third-party products. Revenue scales with compute tokens, not seats, enabling higher volume without human-in-the-loop.
Microsoft’s VibeVoice: Open-Source Speech-to-Text with Diarization
Microsoft released VibeVoice, an MIT-licensed speech-to-text model with built-in speaker diarization. Simon Willison tested a 4-bit MLX conversion on an M5 MacBook, transcribing 1 hour of audio in ~9 minutes using ~60GB RAM.
Walmart expands B2B services
Walmart is expanding its B2B services beyond retail, now offering plumbing, electrical, and general facilities maintenance to local convenience stores and small businesses, leveraging its existing infrastructure and vendor relationships.
CodeRabbit AI Absorbs Codebase History, Reduces 'Bus Factor' Risk
A developer's tweet highlights CodeRabbit's ability to remember a team's codebase history and past decisions, directly addressing the 'bus factor' problem of over-reliance on senior engineers.
UC San Diego Study: AI Copilots Slow Down Experienced Developers
A real-world study from UC San Diego shows AI coding assistants like GitHub Copilot can slow down experienced developers, increasing task time by up to 50%. This challenges the assumption that AI tools universally boost productivity for all skill levels.
Google Hits 75% AI-Generated Code, Up From 50% in Fall 2025
Google reports 75% of all new code is now AI-generated and engineer-approved, a sharp increase from 50% last fall. This indicates a massive, accelerating shift in software development practices at the tech giant.
SpaceXAI Partners with Cursor AI to Build 'World's Best' Coding Assistant
SpaceXAI and Cursor AI announced a partnership to integrate SpaceX's engineering data with Cursor's editor, aiming to create a top-tier AI for coding and knowledge work.
Google DeepMind Forms 'Strike Team' to Boost AI Coding, Citing Anthropic Pressure
Google has formed a specialized team within DeepMind to rapidly improve its AI coding capabilities. The move is a direct response to internal assessments that Anthropic's tools are more advanced, with leadership pushing for agentic systems.
OpenAI Engineer Processed 210B Tokens, Sparking AI Efficiency Debate
An OpenAI engineer processed 210 billion tokens in one week, equivalent to 33 Wikipedia-sized datasets. This extreme usage spotlights a growing trend where high AI consumption by engineers leads to a 10x cost increase and a high volume of discarded code.
NVIDIA's Audio Flamingo Next: 30-Min Audio, Time-Grounded Reasoning
NVIDIA has launched Audio Flamingo Next, a next-generation open audio-language model supporting 30-minute audio inputs and time-grounded reasoning. Trained on over 1 million hours of data, it reportedly outperforms larger models on key audio understanding benchmarks.
WOZCODE Launches Free Claude Code Plugin, Claims 40% Speed Boost
WOZCODE has launched a free plugin for Claude Code, claiming it makes coding sessions 30-40% faster and reduces costs by up to 55%. The plugin is available now.
Meta Employee Builds 'Claudeonomics' Dashboard for Internal AI Token Competition
A Meta employee built an internal dashboard called 'Claudeonomics' that ranks coworkers by their usage of company AI tokens, creating a gamified competition and providing a novel view into internal AI tool adoption patterns.
Google's Auto-Diagnose AI Hits 90% Accuracy Debugging Test Failures
Google researchers built Auto-Diagnose, an LLM tool that analyzes failure logs to suggest root causes. It achieved 90.14% accuracy in evaluation and was used on over 52,000 distinct failing tests after company-wide deployment.
Meta Mandates 65-80% AI-Generated Code by Mid-2026, Zuckerberg Returns to Lab
Meta is mandating that 65-80% of its developers' code be written by AI by mid-2026. CEO Mark Zuckerberg has moved his desk into the company's AI lab and resumed hands-on coding after a 20-year hiatus.
ChatGPT App Code Hints at Upcoming Image Feature Announcement
A developer found new strings in the ChatGPT app's code referencing an 'image announcement,' signaling a likely upcoming feature reveal from OpenAI.
Claude Code Best Practice Repo Hits 19.7K Stars with 84 Anthropic Tips
A GitHub repository called 'claude-code-best-practice' has amassed 19.7K stars by compiling 84 production tips from Anthropic's Claude Code creators. It provides a full open-source framework for moving from basic usage to advanced agentic workflows.
Mind: Open-Source Persistent Memory for AI Coding Agents
An open-source tool called Mind creates a shared memory layer for AI coding agents, allowing them to remember project context across sessions and different interfaces like Claude Code, Cursor, and Windsurf.
AMD AI Director Reports Claude Code Quality Decline, Cites 234k Tool Calls
An AMD AI executive presented data from over 6,800 sessions showing Claude Code's performance has declined since early March, with rising instances of shallow reasoning and incomplete tasks. This raises significant trust issues for engineers using the model in complex development workflows.
OpenMontage: Open-Source Agentic Video Production System Costs $0.69 Per Ad
OpenMontage, an open-source agentic video production system, has been released. It orchestrates 11 pipelines and 49 tools across multiple AI providers to autonomously script, generate assets, edit, and render videos from a plain language prompt.
Developer Fired After Manager Discovers Claude Code, Prefers LLM Output
A developer was fired after his manager discovered he used Claude AI to build a project, then had the AI 'vibe code' a replacement in days. The manager dismissed the developer's warnings about AI hallucinations on complex requirements.
Anthropic Launches Claude Cowork, Its AI-Powered Coding Assistant
Anthropic has made its Claude Cowork coding assistant generally available. This positions it directly against GitHub Copilot and other AI-powered development tools.
AI-Powered 'Vibe Coding' Drives 84% Surge in App Store Submissions
App Store submissions surged 84% last year to over 600,000 new apps, driven by AI-assisted 'vibe coding.' This rapid proliferation is devaluing traditional development skills and flooding the market with low-quality applications.
OpenClaw Voice Interface Demo Shows Real-Time AI Assistant Hardware
A developer showcased a custom hardware rig that integrates a push-button voice interface with the OpenClaw AI model, streaming responses in real-time. This demonstrates a tangible, open-source alternative to proprietary voice assistants like Amazon Alexa.
Microsoft's 'Markdownify' Converts PDFs, Audio, Video to Clean LLM Markdown
Microsoft launched 'Markdownify', a Python tool that converts PDFs, Word docs, Excel, PowerPoint, audio, and YouTube URLs into clean Markdown. This addresses a major pain point in AI pipelines where raw file parsing breaks context and structure.
Swap Your 100 MB Telegram Plugin for This 3.5 MB Rust MCP Server
A drop-in Rust replacement for Claude Code's Telegram plugin that solves common bugs, reduces memory usage by 95%, and enables reliable multi-agent setups.