MLX has achieved a milestone where all tests are passing in the CUDA backend, according to developer zcbenz. The framework, originally targeting Apple Silicon, now offers a validated CUDA path for GPU-accelerated machine learning.
Key facts
- MLX CUDA backend passes all tests.
- Announced by zcbenz via X, retweeted by @Prince_Canuma.
- MLX originally for Apple Silicon with Metal Performance Shaders.
- CUDA backend enables NVIDIA GPU compatibility.
- Exact number of tests not disclosed.
MLX, Apple's open-source machine learning framework, has reached a significant technical milestone: all tests are passing in its CUDA backend. This was announced via a post on X by developer zcbenz, retweeted by @Prince_Canuma. [According to @Prince_Canuma] The CUDA backend passing all tests indicates that the port of MLX's operations from Apple's Metal Performance Shaders to NVIDIA's CUDA architecture is functionally complete and correct.
MLX was introduced by Apple's machine learning research team in 2023 as a framework for efficient training and inference on Apple Silicon, leveraging the unified memory architecture of M-series chips. The framework's design emphasizes array operations and automatic differentiation, similar to NumPy and PyTorch. Adding a CUDA backend extends MLX's reach to NVIDIA GPUs, which dominate the AI training and inference landscape.
Unique Take
The significance here is not just compatibility—it's about MLX becoming a bridge between Apple's hardware ecosystem and the broader CUDA-dependent ML stack. While Apple has pushed its own Metal Performance Shaders for GPU compute, many ML libraries and models are optimized for CUDA. By passing all tests on the CUDA backend, MLX positions itself as a viable alternative for developers who want to write once and run on both Apple Silicon and NVIDIA GPUs without rewriting code. This could reduce friction for Apple-centric ML workflows that need to scale to cloud GPU clusters.
What Was Achieved
The exact number of tests involved was not disclosed, but the phrase "all tests are passing" implies comprehensive coverage of MLX's core operations—likely including matrix multiplications, convolutions, normalization layers, and gradient computations. The CUDA backend would need to replicate the exact numerical behavior of the Metal backend to ensure model parity.
Prior Context
MLX's CUDA backend has been in development since late 2024, with earlier versions supporting limited operations. This milestone suggests the project has reached feature parity. The framework remains relatively niche compared to PyTorch or TensorFlow, but Apple has been investing in MLX for on-device AI and research. [Per Apple's MLX GitHub repository] The CUDA backend could accelerate adoption by making MLX a more practical tool for hybrid workflows.
Implications
For developers, this means MLX can now serve as a unified framework for training on NVIDIA GPUs (cloud) and deploying on Apple Silicon (edge). This is particularly relevant for applications like on-device inference for iOS/macOS apps that require GPU-accelerated training pipelines. The milestone also validates the portability of MLX's design, which relies on lazy computation and graph optimization—features that translate well to CUDA's execution model.
Limitations
The announcement did not include performance benchmarks comparing CUDA backend throughput against native Metal or PyTorch CUDA. Passing tests ensures correctness but not necessarily competitive speed. Memory usage and kernel launch overhead on CUDA versus Metal remain open questions. The framework also lacks the ecosystem breadth of PyTorch, meaning users may need to implement custom operations.
What to watch
Watch for the release of MLX version with CUDA backend support in the official GitHub repository, and any benchmark comparisons against PyTorch CUDA or Metal Performance Shaders on training throughput and inference latency.








