whisper

30 articles about whisper in AI news

Clawdbot AI Agent Autonomously Transcribes & Replies to Voice Messages Using Whisper API

A user demonstrated Clawdbot, an AI agent, autonomously handling a voice message: detecting its Opus format, converting it via FFmpeg, calling OpenAI's Whisper API for transcription, and generating a text reply. This showcases emerging agentic workflow automation without explicit voice feature support.

Mar 29, 202689% relevant

Insanely Fast Whisper CLI Transcribes 2.5 Hours of Audio in 98 Seconds with Flash Attention 2

A new open-source CLI tool called Insanely Fast Whisper achieves 19x speedup over standard Whisper large-v3, transcribing 150 minutes of audio in 98 seconds using Flash Attention 2 and batching with no quality loss.

Mar 27, 202697% relevant

Whisper's Real-Time Translation Demo Shows Practical Progress Toward Universal Translation

OpenAI's Whisper model demonstrated real-time translation from English to Spanish, showcasing progress toward practical universal translation tools. The demo highlights incremental but meaningful improvements in speech-to-speech translation latency and quality.

Mar 18, 202685% relevant

Cohere Transcribe: 2B-Parameter Open-Source ASR Model Achieves 5.42% WER, Topping Hugging Face Leaderboard

Cohere released Transcribe, a 2B-parameter open-source speech recognition model. It claims a 5.42% average word error rate, beating OpenAI Whisper v3 and topping the Hugging Face Open ASR Leaderboard.

Mar 27, 202695% relevant

GPT-5.2-Based Smart Speaker Achieves 100% Resident ID Accuracy in Care Home Safety Evaluation

Researchers evaluated a voice-enabled smart speaker for care homes using Whisper and RAG, achieving 100% resident identification and 89.09% reminder recognition with GPT-5.2. The safety-focused framework highlights remaining challenges in converting informal speech to calendar events (84.65% accuracy).

Mar 26, 202677% relevant

How Simon Willison Ported a 0.2B Image Model to the Browser with Claude

Simon Willison used Claude Code to port a 0.2B image inpainting model to WebGPU, running it as a parallel side project while his main agent worked on Datasette. The technique? Research with Claude.ai, then hand off to Claude Code with research.md.

Jun 22, 202670% relevant

Cursor Trains GPT-Size Model with 10-20x Compute

Cursor trained a GPT-size model from scratch with 10-20x more compute, announced at Compile. The move shifts from fine-tuning to pretraining for code generation.

Jun 21, 202691% relevant

OpenAI Merges Codex into ChatGPT, Ending Standalone API

OpenAI merges Codex into ChatGPT, discontinuing standalone API. Developers must now use chat interface for code generation.

Jun 2, 202687% relevant

ColPali Beats OCR Pipelines for Document RAG: 8× Storage Cost, 0% Chunking

ColPali eliminates OCR and chunking for document-heavy RAG by encoding each 16×16 image patch into a 128-dim vector. It outperforms prior SOTA on the ViDoRe benchmark but costs 8× more storage per page.

May 18, 202684% relevant

xAI Bundles SuperGrok into Hermes Agent — No API Key Needed

xAI integrated SuperGrok subscriptions into Hermes Agent, enabling single OAuth login for Grok 4.3, TTS, images, and X search, eliminating separate API keys.

May 17, 202682% relevant

Cursor SDK Turns AI Agent Runtime into Programmable Infrastructure

Cursor is releasing an SDK that turns its agent runtime into programmable infrastructure for headless use in CI/CD pipelines, internal tools, and third-party products. Revenue scales with compute tokens, not seats, enabling higher volume without human-in-the-loop.

Apr 29, 202682% relevant

Microsoft’s VibeVoice: Open-Source Speech-to-Text with Diarization

Microsoft released VibeVoice, an MIT-licensed speech-to-text model with built-in speaker diarization. Simon Willison tested a 4-bit MLX conversion on an M5 MacBook, transcribing 1 hour of audio in ~9 minutes using ~60GB RAM.

Apr 27, 202685% relevant

Walmart expands B2B services

Walmart is expanding its B2B services beyond retail, now offering plumbing, electrical, and general facilities maintenance to local convenience stores and small businesses, leveraging its existing infrastructure and vendor relationships.

Apr 23, 202678% relevant

CodeRabbit AI Absorbs Codebase History, Reduces 'Bus Factor' Risk

A developer's tweet highlights CodeRabbit's ability to remember a team's codebase history and past decisions, directly addressing the 'bus factor' problem of over-reliance on senior engineers.

Apr 22, 202675% relevant

UC San Diego Study: AI Copilots Slow Down Experienced Developers

A real-world study from UC San Diego shows AI coding assistants like GitHub Copilot can slow down experienced developers, increasing task time by up to 50%. This challenges the assumption that AI tools universally boost productivity for all skill levels.

Apr 22, 202687% relevant

Google Hits 75% AI-Generated Code, Up From 50% in Fall 2025

Google reports 75% of all new code is now AI-generated and engineer-approved, a sharp increase from 50% last fall. This indicates a massive, accelerating shift in software development practices at the tech giant.

Apr 22, 202685% relevant

SpaceXAI Partners with Cursor AI to Build 'World's Best' Coding Assistant

SpaceXAI and Cursor AI announced a partnership to integrate SpaceX's engineering data with Cursor's editor, aiming to create a top-tier AI for coding and knowledge work.

Apr 21, 2026100% relevant

Google DeepMind Forms 'Strike Team' to Boost AI Coding, Citing Anthropic Pressure

Google has formed a specialized team within DeepMind to rapidly improve its AI coding capabilities. The move is a direct response to internal assessments that Anthropic's tools are more advanced, with leadership pushing for agentic systems.

Apr 20, 2026100% relevant

OpenAI Engineer Processed 210B Tokens, Sparking AI Efficiency Debate

An OpenAI engineer processed 210 billion tokens in one week, equivalent to 33 Wikipedia-sized datasets. This extreme usage spotlights a growing trend where high AI consumption by engineers leads to a 10x cost increase and a high volume of discarded code.

Apr 20, 202685% relevant

NVIDIA's Audio Flamingo Next: 30-Min Audio, Time-Grounded Reasoning

NVIDIA has launched Audio Flamingo Next, a next-generation open audio-language model supporting 30-minute audio inputs and time-grounded reasoning. Trained on over 1 million hours of data, it reportedly outperforms larger models on key audio understanding benchmarks.

Apr 19, 202695% relevant

WOZCODE Launches Free Claude Code Plugin, Claims 40% Speed Boost

WOZCODE has launched a free plugin for Claude Code, claiming it makes coding sessions 30-40% faster and reduces costs by up to 55%. The plugin is available now.

Apr 18, 2026100% relevant

Meta Employee Builds 'Claudeonomics' Dashboard for Internal AI Token Competition

A Meta employee built an internal dashboard called 'Claudeonomics' that ranks coworkers by their usage of company AI tokens, creating a gamified competition and providing a novel view into internal AI tool adoption patterns.

Apr 16, 202675% relevant

Google's Auto-Diagnose AI Hits 90% Accuracy Debugging Test Failures

Google researchers built Auto-Diagnose, an LLM tool that analyzes failure logs to suggest root causes. It achieved 90.14% accuracy in evaluation and was used on over 52,000 distinct failing tests after company-wide deployment.

Apr 16, 202687% relevant

Meta Mandates 65-80% AI-Generated Code by Mid-2026, Zuckerberg Returns to Lab

Meta is mandating that 65-80% of its developers' code be written by AI by mid-2026. CEO Mark Zuckerberg has moved his desk into the company's AI lab and resumed hands-on coding after a 20-year hiatus.

Apr 15, 202699% relevant

ChatGPT App Code Hints at Upcoming Image Feature Announcement

A developer found new strings in the ChatGPT app's code referencing an 'image announcement,' signaling a likely upcoming feature reveal from OpenAI.

Apr 15, 202685% relevant

Claude Code Best Practice Repo Hits 19.7K Stars with 84 Anthropic Tips

A GitHub repository called 'claude-code-best-practice' has amassed 19.7K stars by compiling 84 production tips from Anthropic's Claude Code creators. It provides a full open-source framework for moving from basic usage to advanced agentic workflows.

Apr 13, 202691% relevant

Mind: Open-Source Persistent Memory for AI Coding Agents

An open-source tool called Mind creates a shared memory layer for AI coding agents, allowing them to remember project context across sessions and different interfaces like Claude Code, Cursor, and Windsurf.

Apr 12, 202685% relevant

AMD AI Director Reports Claude Code Quality Decline, Cites 234k Tool Calls

An AMD AI executive presented data from over 6,800 sessions showing Claude Code's performance has declined since early March, with rising instances of shallow reasoning and incomplete tasks. This raises significant trust issues for engineers using the model in complex development workflows.

Apr 11, 202689% relevant

OpenMontage: Open-Source Agentic Video Production System Costs $0.69 Per Ad

OpenMontage, an open-source agentic video production system, has been released. It orchestrates 11 pipelines and 49 tools across multiple AI providers to autonomously script, generate assets, edit, and render videos from a plain language prompt.

Apr 11, 202699% relevant

Developer Fired After Manager Discovers Claude Code, Prefers LLM Output

A developer was fired after his manager discovered he used Claude AI to build a project, then had the AI 'vibe code' a replacement in days. The manager dismissed the developer's warnings about AI hallucinations on complex requirements.

Apr 10, 202685% relevant

Explore More

AI Agents Large Language Models Claude Code OpenAI RAG MCP Fine-tuning Benchmarks Open Source AI AI Safety