apple silicon
30 articles about apple silicon in AI news
Gemma 4 Ported to MLX-Swift, Runs Locally on Apple Silicon
Google's Gemma 4 language model has been ported to the MLX-Swift framework by a community developer, making it available for local inference on Apple Silicon Macs and iOS devices through the LocallyAI app.
Apple Silicon Achieves Near-Lossless LLM Compression at 3.5 Bits-Per-Weight, Claims Independent Tester
Independent AI researcher Matthew Weinbach reports achieving near-lossless compression of large language models on Apple Silicon, storing models at 3.5 bits-per-weight while maintaining within 1-2% quality of bf16 precision.
mlx-vlm v0.4.2 Adds SAM3, DOTS-MOCR Models and Critical Fixes for Vision-Language Inference on Apple Silicon
mlx-vlm v0.4.2 released with support for Meta's SAM3 segmentation model and DOTS-MOCR document OCR, plus fixes for Qwen3.5, LFM2-VL, and Magistral models. Enables efficient vision-language inference on Apple Silicon via MLX framework.
Qwen3-TTS Added to mlx-tune, Enabling Full Qwen Model Fine-Tuning on Apple Silicon Macs
The mlx-tune library now supports Qwen3-TTS, making the entire Qwen model stack—including the new text-to-speech model—fine-tunable on Apple Silicon Macs. This expands local AI development options for researchers and developers.
RunAnywhere's MetalRT Engine Delivers Breakthrough AI Performance on Apple Silicon
RunAnywhere has launched MetalRT, a proprietary GPU inference engine that dramatically accelerates on-device AI workloads on Apple Silicon. Their open-source RCLI tool demonstrates sub-200ms voice AI pipelines, outperforming existing solutions like llama.cpp and Apple's MLX.
Roboflow's RF-DETR Model Ported to Apple MLX, Enabling Real-Time On-Device Instance Segmentation
Roboflow's RF-DETR object detection model is now available on Apple's MLX framework, enabling real-time instance segmentation on Apple Silicon devices. This port unlocks new on-device visual analysis applications for robotics and augmented vision-language models.
Ollama Now Supports Apple MLX Backend for Local LLM Inference on macOS
Ollama, the popular framework for running large language models locally, has added support for Apple's MLX framework as a backend. This enables more efficient execution of models like Llama 3.2 and Mistral on Apple Silicon Macs.
mlx-vlm v0.4.4 Launches with Falcon-Perception 300M, TurboQuant Metal Kernels & 1.9x Decode Speedup
The mlx-vlm library v0.4.4 adds support for TII's Falcon-Perception 300M vision model and introduces TurboQuant Metal kernels, achieving up to 1.9x faster decoding with 89% KV cache savings on Apple Silicon.
Open-Source AI Assistant Runs Locally on MacBook Air M4 with 16GB RAM, No API Keys Required
A developer showcased a complete AI assistant running entirely on a MacBook Air M4 with 16GB RAM, using open-source models with no cloud API calls. This demonstrates the feasibility of capable local AI on consumer-grade Apple Silicon hardware.
Gemma 4 26B A4B Hits 45.7 tokens/sec Decode Speed on MacBook Air via MLX Community
A community benchmark shows the Gemma 4 26B A4B model running at 45.7 tokens/sec decode speed on a MacBook Air using the MLX framework. This highlights rapid progress in efficient local deployment of mid-size language models on consumer Apple Silicon.
Apple's M5 Pro and Max: Fusion Architecture Redefines AI Computing on Silicon
Apple unveils M5 Pro and M5 Max chips with groundbreaking Fusion Architecture, merging two 3nm dies into a single SoC. The chips deliver up to 30% faster CPU performance and over 4x peak GPU compute for AI workloads compared to previous generations.
Qualcomm NPU Shows 6-8x OCR Speed-Up Over CPU in Mobile Workload
A benchmark shows Qualcomm's dedicated NPU processing OCR workloads 6-8 times faster than the device's CPU. This highlights the growing efficiency gap for AI tasks on mobile silicon.
Developer Ranks NPU Model Compilation Ease: Apple 1st, AMD Last
Developer @mweinbach ranked the ease of using AI coding agents to compile ML models for NPUs. Apple's ecosystem was rated easiest, while AMD's tooling was ranked most difficult.
Apple M5 Max NPU Benchmarks 2x Faster Than Intel Panther Lake NPU in Parakeet v3 AI Inference Test
A leaked benchmark using the Parakeet v3 AI speech recognition model shows Apple's next-generation M5 Max Neural Processing Unit (NPU) delivering double the inference speed of Intel's competing Panther Lake NPU. This real-world test provides early performance data in the intensifying on-device AI hardware race.
Facebook's SAM 3 Vision Model Ported to Apple's MLX Framework, Enabling Real-Time Tracking on M3 Max
Facebook's Segment Anything Model 3 (SAM 3) has been ported to Apple's MLX framework, enabling real-time object tracking on an M3 Max MacBook Pro. This demonstrates efficient on-device execution of a foundational vision model without cloud dependency.
Apple's Private Cloud Compute: Leak Suggests 4x M2 Ultra Cluster for On-Device AI Offload
A leak suggests Apple's Private Cloud Compute for AI may be built on clusters of four M2 Ultra chips, potentially offering high-performance, private server-side processing for iPhone AI tasks. This would mark Apple's strategic move into dedicated, privacy-focused AI infrastructure.
TurboQuant Ported to Apple MLX, Claims 75% Memory Reduction with Minimal Performance Loss
Developer Prince Canuma has successfully ported the TurboQuant quantization method to Apple's MLX framework, reporting a 75% reduction in memory usage with nearly no performance degradation for on-device AI models.
Apple Reportedly Gains Full Internal Access to Google's Gemini for On-Device Model Distillation
A report claims Apple's AI deal with Google includes full internal model access, enabling distillation of Gemini's reasoning into smaller, on-device models. This would allow Apple to build specialized, efficient AI without relying solely on cloud APIs.
Apple Siri Rebuilt as System-Wide AI Agent in iOS 27, Powered by Apple Foundation Models and Google Gemini
Apple is rebuilding Siri into a conversational system-wide AI agent with deep app integration and personal data access, launching in iOS 27. The overhaul includes a standalone app, web browsing, and writing tools, powered by Apple's models and a Google Gemini partnership.
AI Gold Rush Strains Apple Hardware: High-Memory Macs Sell Out as Local AI Agents Go Mainstream
A surge in demand for local AI development has created severe inventory shortages for high-memory Apple hardware. Mac Studio orders with 128GB or 512GB RAM face 6+ week delays as consumers buy up every available unit to run powerful AI agents like OpenClaw.
Apple's Neural Engine Jailbroken: Researchers Unlock Full Training Capabilities on M-Series Chips
Security researchers have reverse-engineered Apple's Neural Engine, bypassing private APIs to enable full neural network training directly on ANE hardware. This breakthrough unlocks 15.8 TFLOPS of compute previously restricted to inference-only operations across all M-series devices.
Open-Source Project Unlocks Apple's On-Device AI for Any Device on Your Network
Perspective Intelligence Web, an open-source project, enables any device with a browser to access Apple's powerful on-device AI models running locally on a Mac. This MIT-licensed solution addresses privacy concerns by keeping all processing on your private network while extending Apple Intelligence capabilities to Windows, Linux, Android, and Chromebook devices.
Apple's 'Visual Intelligence' Vision: How AI-Powered Cameras Will Redefine Wearables
Apple is developing 'Visual Intelligence'—AI that interprets the physical world through cameras—as the foundation for its next generation of wearables, including smart glasses, advanced AirPods, and a camera-equipped pendant.
OpenAI Codex Now Translates C++, CUDA, and Python to Swift and Python for CoreML Model Conversion
OpenAI's Codex AI code generator is now being used to automatically rewrite C++, CUDA, and Python code into Swift and Python specifically for CoreML model conversion, a previously manual and error-prone process for Apple ecosystem deployment.
Sam3 + MLX Enables Local, Multi-Object Video Tracking Without Cloud APIs
A developer has combined Meta's Segment Anything 3 (Sam3) with Apple's MLX framework to enable local, on-device object tracking in videos. This bypasses cloud API costs and latency for computer vision tasks.
Memory Market Squeeze Threatens iPhone Price Hikes as AI Demands Strain Supply
A global RAM shortage and price increases could force Apple to raise iPhone prices by up to $250, according to industry analysis. The tech giant is reportedly unwilling to absorb the cost, passing it directly to consumers amid surging memory demands from AI applications.
Open-Source AI Crew Replaces Notion, Obsidian with 8 Local Agents
A researcher has built a fully local, open-source system of 8 specialized AI agents that work together to manage an Obsidian vault—handling notes, inboxes, meetings, and deadlines. It replaces separate tools like Notion and inbox triagers with an autonomous, interconnected crew.
AI Weekly: GPT-6 Rumors, DeepSeek V4 on Huawei, Anthropic Models, Qwen 3.6-Plus
A weekly roundup video aggregates major AI rumors and announcements, including unverified GPT-6 details, DeepSeek V4 reportedly running on Huawei hardware, and launches of Anthropic's Conway and Ultraplan and Alibaba's Qwen 3.6-Plus.
Nvidia Claims MLPerf Inference v6.0 Records with 288-GPU Blackwell Ultra Systems, Highlights 2.7x Software Gains
MLCommons released MLPerf Inference v6.0 results, introducing multimodal and video model tests. Nvidia set records using 288-GPU Blackwell Ultra systems and achieved a 2.7x performance jump on DeepSeek-R1 via software optimizations alone.
TSMC 2nm Capacity Constraints Create Opening for Samsung in AI Chip Foundry Race
TSMC has reportedly hit a 'hard capacity wall' at its 2nm node, creating a strategic opportunity for Samsung Foundry to capture AI accelerator business from major clients like Nvidia and OpenAI. This bottleneck could reshape the competitive landscape for advanced semiconductor manufacturing.