on device ml

30 articles about on device ml in AI news

Roboflow's RF-DETR Model Ported to Apple MLX, Enabling Real-Time On-Device Instance Segmentation

Roboflow's RF-DETR object detection model is now available on Apple's MLX framework, enabling real-time instance segmentation on Apple Silicon devices. This port unlocks new on-device visual analysis applications for robotics and augmented vision-language models.

89% relevant

AirTrain Enables Distributed ML Training on MacBooks Over Wi-Fi

Developer @AlexanderCodes_ open-sourced AirTrain, a tool that enables distributed ML training across Apple Silicon MacBooks using Wi-Fi by syncing gradients every 500 steps instead of every step. This makes personal device training feasible for models up to 70B parameters without cloud GPU costs.

95% relevant

MLX Enables Local Grounded Reasoning for Satellite, Security, Robotics AI

Apple's MLX framework is enabling 'local grounded reasoning' for AI applications in satellite imagery, security systems, and robotics, moving complex tasks from the cloud to on-device processing.

85% relevant

Technical Implementation: Building a Local Fine-Tuning Engine with MLX

A developer shares a backend implementation guide for automating the fine-tuning process of AI models using Apple's MLX framework. This enables private, on-device model customization without cloud dependencies, which is crucial for handling sensitive data.

78% relevant

Gemma 4 Ported to MLX-Swift, Runs Locally on Apple Silicon

Google's Gemma 4 language model has been ported to the MLX-Swift framework by a community developer, making it available for local inference on Apple Silicon Macs and iOS devices through the LocallyAI app.

87% relevant

Facebook's SAM 3 Vision Model Ported to Apple's MLX Framework, Enabling Real-Time Tracking on M3 Max

Facebook's Segment Anything Model 3 (SAM 3) has been ported to Apple's MLX framework, enabling real-time object tracking on an M3 Max MacBook Pro. This demonstrates efficient on-device execution of a foundational vision model without cloud dependency.

87% relevant

TurboQuant Ported to Apple MLX, Claims 75% Memory Reduction with Minimal Performance Loss

Developer Prince Canuma has successfully ported the TurboQuant quantization method to Apple's MLX framework, reporting a 75% reduction in memory usage with nearly no performance degradation for on-device AI models.

85% relevant

MLX-Benchmark Suite Launches as First Comprehensive LLM Eval for Apple Silicon

The MLX-Benchmark Suite has been released as the first comprehensive evaluation framework for Large Language Models running on Apple's MLX framework. It provides standardized metrics for models optimized for Apple Silicon hardware.

85% relevant

HUOZIIME: A Research Framework for On-Device LLM-Powered Input Methods

A new research paper introduces HUOZIIME, a personalized on-device input method powered by a lightweight LLM. It uses a hierarchical memory mechanism to capture user-specific input history, enabling privacy-preserving, real-time text generation tailored to individual writing styles.

76% relevant

MLX-VLM Adds Continuous Batching, OpenAI API, and Vision Cache for Apple Silicon

The next release of MLX-VLM will introduce continuous batching, an OpenAI-compatible API, and vision feature caching for multimodal models running locally on Apple Silicon. These optimizations promise up to 228x speedups on cache hits for models like Gemma4.

95% relevant

DFlash Brings Speculative Decoding to Apple Silicon via MLX

DFlash, a new open-source project, implements speculative decoding for large language models on Apple Silicon using the MLX framework, reportedly delivering up to 2.5x speedup on an M5 Max.

85% relevant

ModelBest Hits $1B+ Valuation for On-Device Foundation Models

ModelBest, a Chinese developer of on-device AI foundation models, raised several hundred million RMB, reaching a valuation exceeding $1 billion. The funding will accelerate its push to deploy efficient models directly on smartphones and IoT devices.

95% relevant

Ethan Mollick: Gemma 4 Impressive On-Device, But Agentic Workflows Doubted

Wharton professor Ethan Mollick finds Google's Gemma 4 powerful for on-device use but is skeptical about its ability to execute true agentic workflows, citing limitations in judgment and self-correction.

75% relevant

mlx-vlm v0.4.4 Launches with Falcon-Perception 300M, TurboQuant Metal Kernels & 1.9x Decode Speedup

The mlx-vlm library v0.4.4 adds support for TII's Falcon-Perception 300M vision model and introduces TurboQuant Metal kernels, achieving up to 1.9x faster decoding with 89% KV cache savings on Apple Silicon.

85% relevant

OpenAI Codex Now Translates C++, CUDA, and Python to Swift and Python for CoreML Model Conversion

OpenAI's Codex AI code generator is now being used to automatically rewrite C++, CUDA, and Python code into Swift and Python specifically for CoreML model conversion, a previously manual and error-prone process for Apple ecosystem deployment.

89% relevant

Ollama Now Supports Apple MLX Backend for Local LLM Inference on macOS

Ollama, the popular framework for running large language models locally, has added support for Apple's MLX framework as a backend. This enables more efficient execution of models like Llama 3.2 and Mistral on Apple Silicon Macs.

85% relevant

Apple Reportedly Gains Full Internal Access to Google's Gemini for On-Device Model Distillation

A report claims Apple's AI deal with Google includes full internal model access, enabling distillation of Gemini's reasoning into smaller, on-device models. This would allow Apple to build specialized, efficient AI without relying solely on cloud APIs.

95% relevant

KAIST Develops 'SoulMate' AI Chip for Real-Time, On-Device Personalization

KAIST researchers have developed a new AI semiconductor, 'SoulMate,' that enables real-time, on-device learning of user habits and preferences. The chip combines RAG and LoRA for instant personalization while consuming minimal power, aiming for commercialization by 2027.

70% relevant

Edge AI Breakthrough: Qwen3.5 2B Runs Locally on iPhone 17 Pro, Redefining On-Device Intelligence

Alibaba's Qwen3.5 2B model now runs locally on iPhone 17 Pro devices, marking a significant breakthrough in edge AI. This development enables sophisticated language processing without cloud dependency, potentially transforming mobile AI applications and user privacy paradigms.

85% relevant

Apple's Neural Engine Jailbroken: Researchers Unlock On-Device AI Training Capabilities

A researcher has reverse-engineered Apple's private Neural Engine APIs to enable direct transformer training on M-series chips, bypassing CoreML restrictions. This breakthrough could enable battery-efficient local model training and fine-tuning without cloud dependency.

95% relevant

Apple's Private Cloud Compute: Leak Suggests 4x M2 Ultra Cluster for On-Device AI Offload

A leak suggests Apple's Private Cloud Compute for AI may be built on clusters of four M2 Ultra chips, potentially offering high-performance, private server-side processing for iPhone AI tasks. This would mark Apple's strategic move into dedicated, privacy-focused AI infrastructure.

85% relevant

Sam3 + MLX Enables Local, Multi-Object Video Tracking Without Cloud APIs

A developer has combined Meta's Segment Anything 3 (Sam3) with Apple's MLX framework to enable local, on-device object tracking in videos. This bypasses cloud API costs and latency for computer vision tasks.

85% relevant

RunAnywhere's MetalRT Engine Delivers Breakthrough AI Performance on Apple Silicon

RunAnywhere has launched MetalRT, a proprietary GPU inference engine that dramatically accelerates on-device AI workloads on Apple Silicon. Their open-source RCLI tool demonstrates sub-200ms voice AI pipelines, outperforming existing solutions like llama.cpp and Apple's MLX.

80% relevant

Google's TensorFlow 2.21 Revolutionizes Edge AI with Unified LiteRT Framework

Google has launched TensorFlow 2.21, marking LiteRT's transition to a production-ready universal on-device inference framework. This major update delivers faster GPU performance, new NPU acceleration, and seamless PyTorch edge deployment, effectively replacing TensorFlow Lite for mobile and edge applications.

75% relevant

Qwen2.5-7B-Instruct 4-bit DWQ Model Released for Apple MLX

A developer has ported a 4-bit quantized Qwen2.5-7B-Instruct model to Apple's MLX framework. This makes the capable 7B model more efficient to run on Apple Silicon Macs.

77% relevant

7 Free GitHub Repos for Running LLMs Locally on Laptop Hardware

A developer shared a list of seven key GitHub repositories, including AnythingLLM and llama.cpp, that allow users to run LLMs locally without cloud costs. This reflects the growing trend of efficient, private on-device AI inference.

75% relevant

Storing Less, Finding More: Novelty Filtering Architecture for Cross-Modal Retrieval on Edge Cameras

A new streaming retrieval architecture uses an on-device 'epsilon-net' filter to retain only semantically novel video frames, dramatically improving cross-modal search accuracy while reducing power consumption to 2.7 mW. This addresses the fundamental problem of redundant frames crowding out correct results in continuous video streams.

82% relevant

VHS: Latent Verifier Cuts Diffusion Model Verification Cost by 63.3%, Boosts GenEval by 2.7%

Researchers propose Verifier on Hidden States (VHS), a verifier operating directly on DiT generator features, eliminating costly pixel-space decoding. It reduces joint generation-and-verification time by 63.3% and improves GenEval performance by 2.7% versus MLLM verifiers.

95% relevant

Violoop's Hardware Bet: A New Frontier in AI Interaction Beyond the Screen

Hardware startup Violoop has secured multi-million dollar funding to develop the world's first 'physical-level AI Operator,' aiming to move AI interaction from purely digital interfaces to tangible, desktop-integrated hardware devices.

95% relevant

Beyond Anomaly Detection: Protecting High-Value Affiliate Partnerships in Luxury Retail

Traditional ML fraud detection systems often flag top-performing luxury affiliates as suspicious due to their outlier performance. This article explores the baseline problem and presents a governance-first approach to distinguish true fraud from legitimate viral success.

70% relevant