Timeline
mlx-vlm v0.6.2 released with Gemma 4 QAT support and video input for 12B model
MLX CUDA backend passes all tests, enabling NVIDIA GPU support
mlx-vlm v0.5.0 released with continuous batching, speculative decoding, and distributed inference for Apple Silicon
Next release to introduce continuous batching, OpenAI-compatible API, and vision caching.
Release of Apple's MLX framework for efficient on-device machine learning on Apple Silicon
Apple's MLX framework was highlighted at the AI Engineer Summit for enabling local grounded reasoning for satellite, security, and robotics AI.
Released version 0.4.4 with support for Falcon-Perception 300M and TurboQuant Metal kernels.
Achieved up to 1.9x faster decoding and 89% KV cache savings with TurboQuant Metal kernels.
Released version 0.4.2 with support for SAM3 and DOTS-MOCR models and critical fixes