mlx
30 articles about mlx in AI news
MLX CUDA Backend Passes All Tests, Closing Apple GPU Gap
MLX CUDA backend passes all tests, enabling NVIDIA GPU support. Milestone bridges Apple Silicon and CUDA ecosystems for ML workloads.
mlx-audio v0.4.3 Ships 6 New TTS Models, Slimmer Deps
mlx-audio v0.4.3 adds 6 TTS models, server concurrency, and slims dependencies, targeting Apple Silicon developers.
mlx-vlm v0.5.0 Adds Continuous Batching, Distributed Inference for Apple Silicon
mlx-vlm v0.5.0 adds continuous batching, speculative decoding, and distributed inference for Apple Silicon. The release supports Qwen3.5, Kimi K2.5, Gemma 4 video, and new models with 21 contributors.
DeepSeek-V4 Ported to MLX for Apple Silicon Inference
A developer has ported DeepSeek-V4 to Apple's MLX framework, allowing the large language model to run on Apple Silicon Macs. Early results show functional inference with room for optimization.
Prince Canuma's M3 Ultra 512GB & RTX Pro 6000 Setup for MLX Research
Independent developer Prince Canuma has assembled a powerful, community-sponsored home compute cluster for MLX research and model porting, featuring an M3 Ultra with 512GB RAM and an RTX Pro 6000.
MLX-Benchmark Suite Launches as First Comprehensive LLM Eval for Apple Silicon
The MLX-Benchmark Suite has been released as the first comprehensive evaluation framework for Large Language Models running on Apple's MLX framework. It provides standardized metrics for models optimized for Apple Silicon hardware.
Qwen2.5-7B-Instruct 4-bit DWQ Model Released for Apple MLX
A developer has ported a 4-bit quantized Qwen2.5-7B-Instruct model to Apple's MLX framework. This makes the capable 7B model more efficient to run on Apple Silicon Macs.
MLX-VLM Adds Continuous Batching, OpenAI API, and Vision Cache for Apple Silicon
The next release of MLX-VLM will introduce continuous batching, an OpenAI-compatible API, and vision feature caching for multimodal models running locally on Apple Silicon. These optimizations promise up to 228x speedups on cache hits for models like Gemma4.
DFlash Brings Speculative Decoding to Apple Silicon via MLX
DFlash, a new open-source project, implements speculative decoding for large language models on Apple Silicon using the MLX framework, reportedly delivering up to 2.5x speedup on an M5 Max.
MLX Enables Local Grounded Reasoning for Satellite, Security, Robotics AI
Apple's MLX framework is enabling 'local grounded reasoning' for AI applications in satellite imagery, security systems, and robotics, moving complex tasks from the cloud to on-device processing.
Technical Implementation: Building a Local Fine-Tuning Engine with MLX
A developer shares a backend implementation guide for automating the fine-tuning process of AI models using Apple's MLX framework. This enables private, on-device model customization without cloud dependencies, which is crucial for handling sensitive data.
MLX-LM v0.9.0 Adds Better Batching, Supports Gemma 4 on Apple Silicon
Apple's MLX-LM framework released version 0.9.0 with enhanced server batching and support for Google's Gemma 4 model, improving local LLM inference efficiency on Apple Silicon. This update addresses a key performance bottleneck for developers running models locally on Mac hardware.
Gemma 4 Ported to MLX-Swift, Runs Locally on Apple Silicon
Google's Gemma 4 language model has been ported to the MLX-Swift framework by a community developer, making it available for local inference on Apple Silicon Macs and iOS devices through the LocallyAI app.
mlx-vlm v0.4.4 Launches with Falcon-Perception 300M, TurboQuant Metal Kernels & 1.9x Decode Speedup
The mlx-vlm library v0.4.4 adds support for TII's Falcon-Perception 300M vision model and introduces TurboQuant Metal kernels, achieving up to 1.9x faster decoding with 89% KV cache savings on Apple Silicon.
Gemma 4 26B A4B Hits 45.7 tokens/sec Decode Speed on MacBook Air via MLX Community
A community benchmark shows the Gemma 4 26B A4B model running at 45.7 tokens/sec decode speed on a MacBook Air using the MLX framework. This highlights rapid progress in efficient local deployment of mid-size language models on consumer Apple Silicon.
Roboflow's RF-DETR Model Ported to Apple MLX, Enabling Real-Time On-Device Instance Segmentation
Roboflow's RF-DETR object detection model is now available on Apple's MLX framework, enabling real-time instance segmentation on Apple Silicon devices. This port unlocks new on-device visual analysis applications for robotics and augmented vision-language models.
Ollama Now Supports Apple MLX Backend for Local LLM Inference on macOS
Ollama, the popular framework for running large language models locally, has added support for Apple's MLX framework as a backend. This enables more efficient execution of models like Llama 3.2 and Mistral on Apple Silicon Macs.
Sam3 + MLX Enables Local, Multi-Object Video Tracking Without Cloud APIs
A developer has combined Meta's Segment Anything 3 (Sam3) with Apple's MLX framework to enable local, on-device object tracking in videos. This bypasses cloud API costs and latency for computer vision tasks.
Facebook's SAM 3 Vision Model Ported to Apple's MLX Framework, Enabling Real-Time Tracking on M3 Max
Facebook's Segment Anything Model 3 (SAM 3) has been ported to Apple's MLX framework, enabling real-time object tracking on an M3 Max MacBook Pro. This demonstrates efficient on-device execution of a foundational vision model without cloud dependency.
mlx-vlm v0.4.2 Adds SAM3, DOTS-MOCR Models and Critical Fixes for Vision-Language Inference on Apple Silicon
mlx-vlm v0.4.2 released with support for Meta's SAM3 segmentation model and DOTS-MOCR document OCR, plus fixes for Qwen3.5, LFM2-VL, and Magistral models. Enables efficient vision-language inference on Apple Silicon via MLX framework.
Qwen3-TTS Added to mlx-tune, Enabling Full Qwen Model Fine-Tuning on Apple Silicon Macs
The mlx-tune library now supports Qwen3-TTS, making the entire Qwen model stack—including the new text-to-speech model—fine-tunable on Apple Silicon Macs. This expands local AI development options for researchers and developers.
TurboQuant Ported to Apple MLX, Claims 75% Memory Reduction with Minimal Performance Loss
Developer Prince Canuma has successfully ported the TurboQuant quantization method to Apple's MLX framework, reporting a 75% reduction in memory usage with nearly no performance degradation for on-device AI models.
Microsoft’s VibeVoice: Open-Source Speech-to-Text with Diarization
Microsoft released VibeVoice, an MIT-licensed speech-to-text model with built-in speaker diarization. Simon Willison tested a 4-bit MLX conversion on an M5 MacBook, transcribing 1 hour of audio in ~9 minutes using ~60GB RAM.
RunAnywhere's MetalRT Engine Delivers Breakthrough AI Performance on Apple Silicon
RunAnywhere has launched MetalRT, a proprietary GPU inference engine that dramatically accelerates on-device AI workloads on Apple Silicon. Their open-source RCLI tool demonstrates sub-200ms voice AI pipelines, outperforming existing solutions like llama.cpp and Apple's MLX.
Qwen 3.6 27B Hits 34 tok/s on M5 Max MacBook Pro
Qwen 3.6 27B hits 34 tok/s on M5 Max MacBook Pro with 90% acceptance rate, per @rohanpaul_ai. Shows viable local LLM inference on Apple Silicon.
Google Quantum Chip Breaks Bitcoin Cryptography: Threat Analysis
Google demonstrated a quantum computer capable of breaking the elliptic curve cryptography (ECDSA-256) securing Bitcoin and Ethereum. This poses an existential threat to these networks unless they migrate to quantum-resistant algorithms.
Qwen3.6-27B: How to Run a 17GB Local Model That Beats 397B MoE on Coding Tasks
Qwen3.6-27B delivers flagship-level coding performance in a 55.6GB model that can be quantized to 16.8GB, making high-quality local coding assistance accessible.
PRL-Bench: LLMs Score Below 50% on End-to-End Physics Research Tasks
Researchers introduced PRL-Bench, a benchmark built from 100 recent Physical Review Letters papers, testing LLMs on end-to-end physics research. Top models scored below 50%, exposing a significant capability gap for autonomous scientific discovery.
Gur Singh Claims 7 M4 MacBooks Match A100, Calls Cloud GPU Training a 'Scam'
Developer Gur Singh posted that seven M4 MacBooks (2.9 TFLOPS each) match an NVIDIA A100's performance, calling cloud GPU training a 'scam' and advocating for distributed, consumer-hardware approaches.
AirTrain Enables Distributed ML Training on MacBooks Over Wi-Fi
Developer @AlexanderCodes_ open-sourced AirTrain, a tool that enables distributed ML training across Apple Silicon MacBooks using Wi-Fi by syncing gradients every 500 steps instead of every step. This makes personal device training feasible for models up to 70B parameters without cloud GPU costs.