Listen to today's AI briefing

Daily podcast — 5 min, AI-narrated summary of top stories

Apple MacBook Pro on a desk with code editor and audio waveform on screen, suggesting machine learning and TTS model…

mlx-audio v0.4.3 Ships 6 New TTS Models, Slimmer Deps

mlx-audio v0.4.3 adds 6 TTS models, server concurrency, and slims dependencies, targeting Apple Silicon developers.

AAAla SMITH & AI Research Desk·10h ago·2 min read··9 views·AI-Generated·Report error

Source: x.comvia @Prince_CanumaSingle Source

What's new in mlx-audio v0.4.3?

mlx-audio v0.4.3 adds 6 new TTS models (Higgs Audio v2, OmniVoice, LongCat-AudioDiT 1B, MOSS-TTS-Nano, Irodori-TTS v2, MeloTTS-English), server concurrent requests + continuous batching, Voxtral Realtime ~3x faster, and slims dependencies by removing librosa, soundfile, pyloudnorm, pydub, tiktoken.

TL;DR

6 new TTS models added · Server gets concurrent requests + batching · Voxtral Realtime ~3x faster on 4-bit

mlx-audio v0.4.3 arrives with 6 new TTS models and server upgrades. The release targets Apple Silicon developers needing efficient, on-device audio generation.

Key facts

6 new TTS models added: Higgs Audio v2, OmniVoice, LongCat-AudioDiT 1B, MOSS-TTS-Nano, Irodori-TTS v2, MeloTTS-English
OmniVoice supports 646+ languages
Voxtral Realtime ~3x faster on 4-bit
5 dependencies removed: librosa, soundfile, pyloudnorm, pydub, tiktoken
14 contributors, including 8 new

mlx-audio v0.4.3, announced by @Prince_Canuma on X, brings a significant expansion of capabilities for Apple Silicon audio processing. [According to @Prince_Canuma] The release adds six new text-to-speech (TTS) models: Higgs Audio v2 (voice cloning), OmniVoice (646+ languages), LongCat-AudioDiT 1B, MOSS-TTS-Nano, Irodori-TTS v2, and MeloTTS-English. This broadens the library's utility from a niche tool to a more comprehensive audio generation platform.

The server component receives notable improvements: concurrent request handling and continuous batching for Qwen3 TTS, plus client-disconnect handling. [Per the release notes] This moves mlx-audio closer to production-grade serving capability, though the library remains primarily targeted at research and prototyping workflows on Mac hardware.

Performance gains are highlighted with Voxtral Realtime achieving roughly 3x faster inference on 4-bit quantization. [According to @Prince_Canuma] Mel-Band-RoFormer is introduced for vocal source separation, adding a new capability beyond TTS. Parakeet TDT gets longform performance improvements, and Fish Speech S2 Pro gains batching support.

A unique structural take: the dependency slimming — removing librosa, soundfile, pyloudnorm, pydub, and tiktoken — signals a shift toward a leaner, more self-contained library. This reduces installation friction and potential version conflicts, a practical move that matters more to developers than any single model addition. It suggests the maintainers are prioritizing developer experience (DX) alongside feature breadth.

The release credits 14 contributors, including 8 new ones, with special mentions to @lllucas, @KarnikShreyas, and @beshkenadze. [Per the announcement] A new MkDocs site and WebM audio support round out the update. Installation is via uv pip install -U mlx-audio.

What to watch

Watch for community adoption of the new server features (concurrent requests, batching) as a proxy for whether mlx-audio graduates from research tool to production serving layer. Also track if the OmniVoice 646-language claim gets independently benchmarked.

Source: gentic.news · 10h ago · author=Ala SMITH · citation.json

AI-assisted reporting. Generated by gentic.news from multiple verified sources, fact-checked against the Living Graph of 4,300+ entities. Edited by Ala SMITH.

Following this story?

Get a weekly digest with AI predictions, trends, and analysis — free.

AI Analysis

This release is a clear step toward making mlx-audio a more complete audio toolkit rather than just a TTS wrapper. The server improvements — concurrent requests, continuous batching — are the most strategically significant addition, as they enable multi-user scenarios without requiring a separate inference server. The dependency slimming is a smart DX move that addresses a common pain point in Python audio libraries. Compared to prior art like Coqui TTS or Piper, mlx-audio remains Mac-only but leverages Apple Silicon's unified memory for potentially lower latency than CPU-based alternatives. The inclusion of voice cloning (Higgs Audio v2) and 646-language support (OmniVoice) positions it against cloud APIs like ElevenLabs, but without the cloud dependency. The contrarian take: while the 6 new models get headlines, the real story is the infrastructure improvements. Without concurrent request handling, the library was limited to single-user demos. Now it can plausibly serve small teams. The next missing piece is multi-GPU support, which would unlock larger model serving.

#open source #apple silicon #mlx #tts

Compare side-by-side

OmniVoice vs Higgs Audio v2

→

Mentioned in this article

mlx-audio Apple OmniVoice Higgs Audio v2 Voxtral Realtime MOSS-TTS-Nano Irodori-TTS v2 MeloTTS-English LongCat-AudioDiT 1B

Enjoyed this article?

Get the weekly AI intelligence briefing

✨AI Toolslive

Five one-click lenses on this article. Cached for 24h.

Pick a tool above to generate an instant lens on this article.

Products & Launches

NVIDIA Open-Sources MRC, the RDMA Protocol Powering OpenAI's Blackwell Clusters

From the lab

The framework underneath this story

Every article on this site sits on top of one engine and one framework — both built by the lab.

Original research · EUMAS 2026

MNEMA — A Witness Lattice for Multi-Agent AI Memory

Cryptographic memory units · 1−α detection floor · 15 pp PDF

Field framework · v1.0

Epistemic Infrastructure

12 pillars · 11-stage knowledge metabolism · pathology catalog

More in Products & Launches

View all

Side-by-side AI-generated video clips from Seedance 2.0 and Pollo AI, with a price comparison overlay showing $0.11…

Products & Launches

Pollo AI Underprices Seedance 2.0 at $0.11/Video

Pollo AI offers Seedance 2.0 at $0.11/video, 5-10x below Seedance's API rates, signaling a pricing war in AI video generation.

x.com/17h ago/3 min read

ai videoapipollo ai

Sony and Bandai Namco executives in a meeting room with holographic game character displays, discussing generative…

Products & Launches

Sony, Bandai Namco Launch GenAI Pilot for Game Dev Speedup

Sony and Bandai Namco pilot generative AI for faster game dev. AI targets facial animation, QA, payments, and visual fidelity.

x.com/19h ago/3 min read

pilotsonybandai namco

A humanoid robot with a white torso and black arms stands at a kitchen counter, using a spatula to stir-fry tomatoes…

Products & Launches

Genesis AI Reveals GENE-26.5: Humanoid Robot Cooks Stir-Fry, Solves Rubik's Cube

Genesis AI released GENE-26.5, a foundation model enabling a humanoid robot to autonomously cook stir-fry, solve Rubik's cubes, and organize cables. The approach uses human data pretraining and simulation closed-loop evaluation.

pandaily.com/1d ago/3 min read/Multi-Source

roboticsaihumanoid robots

What to watch

AI Analysis

✨AI Toolslive

Related Articles

Datacenter Developers Flee City Zoning for Unincorporated County Land

Claude Code Thwarts 13M RPS DDoS Attack in 10 Minutes

Claude Code Head Says AI Now Writes All His Production Code

Anthropic's 220K GPU Cluster: $5B Compute Bet Revealed

Anthropic Doubles Claude Code Rate Limits, Leases All of SpaceX's Colossus 1

NVIDIA Open-Sources MRC, the RDMA Protocol Powering OpenAI's Blackwell Clusters

The framework underneath this story

More in Products & Launches

Pollo AI Underprices Seedance 2.0 at $0.11/Video

Sony, Bandai Namco Launch GenAI Pilot for Game Dev Speedup

Genesis AI Reveals GENE-26.5: Humanoid Robot Cooks Stir-Fry, Solves Rubik's Cube