Skip to content
gentic.news — AI News Intelligence Platform
Connecting to the Living Graph…

Listen to today's AI briefing

Daily podcast — 5 min, AI-narrated summary of top stories

Apple MacBook Pro on a desk with code editor and audio waveform on screen, suggesting machine learning and TTS model…

mlx-audio v0.4.3 Ships 6 New TTS Models, Slimmer Deps

mlx-audio v0.4.3 adds 6 TTS models, server concurrency, and slims dependencies, targeting Apple Silicon developers.

·10h ago·2 min read··9 views·AI-Generated·Report error
Share:
What's new in mlx-audio v0.4.3?

mlx-audio v0.4.3 adds 6 new TTS models (Higgs Audio v2, OmniVoice, LongCat-AudioDiT 1B, MOSS-TTS-Nano, Irodori-TTS v2, MeloTTS-English), server concurrent requests + continuous batching, Voxtral Realtime ~3x faster, and slims dependencies by removing librosa, soundfile, pyloudnorm, pydub, tiktoken.

TL;DR

6 new TTS models added · Server gets concurrent requests + batching · Voxtral Realtime ~3x faster on 4-bit

mlx-audio v0.4.3 arrives with 6 new TTS models and server upgrades. The release targets Apple Silicon developers needing efficient, on-device audio generation.

Key facts

mlx-audio v0.4.3, announced by @Prince_Canuma on X, brings a significant expansion of capabilities for Apple Silicon audio processing. [According to @Prince_Canuma] The release adds six new text-to-speech (TTS) models: Higgs Audio v2 (voice cloning), OmniVoice (646+ languages), LongCat-AudioDiT 1B, MOSS-TTS-Nano, Irodori-TTS v2, and MeloTTS-English. This broadens the library's utility from a niche tool to a more comprehensive audio generation platform.

The server component receives notable improvements: concurrent request handling and continuous batching for Qwen3 TTS, plus client-disconnect handling. [Per the release notes] This moves mlx-audio closer to production-grade serving capability, though the library remains primarily targeted at research and prototyping workflows on Mac hardware.

Performance gains are highlighted with Voxtral Realtime achieving roughly 3x faster inference on 4-bit quantization. [According to @Prince_Canuma] Mel-Band-RoFormer is introduced for vocal source separation, adding a new capability beyond TTS. Parakeet TDT gets longform performance improvements, and Fish Speech S2 Pro gains batching support.

A unique structural take: the dependency slimming — removing librosa, soundfile, pyloudnorm, pydub, and tiktoken — signals a shift toward a leaner, more self-contained library. This reduces installation friction and potential version conflicts, a practical move that matters more to developers than any single model addition. It suggests the maintainers are prioritizing developer experience (DX) alongside feature breadth.

The release credits 14 contributors, including 8 new ones, with special mentions to @lllucas, @KarnikShreyas, and @beshkenadze. [Per the announcement] A new MkDocs site and WebM audio support round out the update. Installation is via uv pip install -U mlx-audio.

What to watch

Watch for community adoption of the new server features (concurrent requests, batching) as a proxy for whether mlx-audio graduates from research tool to production serving layer. Also track if the OmniVoice 646-language claim gets independently benchmarked.

Source: gentic.news · · author= · citation.json

AI-assisted reporting. Generated by gentic.news from multiple verified sources, fact-checked against the Living Graph of 4,300+ entities. Edited by Ala SMITH.

Following this story?

Get a weekly digest with AI predictions, trends, and analysis — free.

AI Analysis

This release is a clear step toward making mlx-audio a more complete audio toolkit rather than just a TTS wrapper. The server improvements — concurrent requests, continuous batching — are the most strategically significant addition, as they enable multi-user scenarios without requiring a separate inference server. The dependency slimming is a smart DX move that addresses a common pain point in Python audio libraries. Compared to prior art like Coqui TTS or Piper, mlx-audio remains Mac-only but leverages Apple Silicon's unified memory for potentially lower latency than CPU-based alternatives. The inclusion of voice cloning (Higgs Audio v2) and 646-language support (OmniVoice) positions it against cloud APIs like ElevenLabs, but without the cloud dependency. The contrarian take: while the 6 new models get headlines, the real story is the infrastructure improvements. Without concurrent request handling, the library was limited to single-user demos. Now it can plausibly serve small teams. The next missing piece is multi-GPU support, which would unlock larger model serving.
Compare side-by-side
OmniVoice vs Higgs Audio v2
Enjoyed this article?
Share:

AI Toolslive

Five one-click lenses on this article. Cached for 24h.

Pick a tool above to generate an instant lens on this article.

Related Articles

From the lab

The framework underneath this story

Every article on this site sits on top of one engine and one framework — both built by the lab.

More in Products & Launches

View all