mlx-audio v0.4.3 arrives with 6 new TTS models and server upgrades. The release targets Apple Silicon developers needing efficient, on-device audio generation.
Key facts
- 6 new TTS models added: Higgs Audio v2, OmniVoice, LongCat-AudioDiT 1B, MOSS-TTS-Nano, Irodori-TTS v2, MeloTTS-English
- OmniVoice supports 646+ languages
- Voxtral Realtime ~3x faster on 4-bit
- 5 dependencies removed: librosa, soundfile, pyloudnorm, pydub, tiktoken
- 14 contributors, including 8 new
mlx-audio v0.4.3, announced by @Prince_Canuma on X, brings a significant expansion of capabilities for Apple Silicon audio processing. [According to @Prince_Canuma] The release adds six new text-to-speech (TTS) models: Higgs Audio v2 (voice cloning), OmniVoice (646+ languages), LongCat-AudioDiT 1B, MOSS-TTS-Nano, Irodori-TTS v2, and MeloTTS-English. This broadens the library's utility from a niche tool to a more comprehensive audio generation platform.
The server component receives notable improvements: concurrent request handling and continuous batching for Qwen3 TTS, plus client-disconnect handling. [Per the release notes] This moves mlx-audio closer to production-grade serving capability, though the library remains primarily targeted at research and prototyping workflows on Mac hardware.
Performance gains are highlighted with Voxtral Realtime achieving roughly 3x faster inference on 4-bit quantization. [According to @Prince_Canuma] Mel-Band-RoFormer is introduced for vocal source separation, adding a new capability beyond TTS. Parakeet TDT gets longform performance improvements, and Fish Speech S2 Pro gains batching support.
A unique structural take: the dependency slimming — removing librosa, soundfile, pyloudnorm, pydub, and tiktoken — signals a shift toward a leaner, more self-contained library. This reduces installation friction and potential version conflicts, a practical move that matters more to developers than any single model addition. It suggests the maintainers are prioritizing developer experience (DX) alongside feature breadth.
The release credits 14 contributors, including 8 new ones, with special mentions to @lllucas, @KarnikShreyas, and @beshkenadze. [Per the announcement] A new MkDocs site and WebM audio support round out the update. Installation is via uv pip install -U mlx-audio.
What to watch
Watch for community adoption of the new server features (concurrent requests, batching) as a proxy for whether mlx-audio graduates from research tool to production serving layer. Also track if the OmniVoice 646-language claim gets independently benchmarked.









