Skip to content
gentic.news — AI News Intelligence Platform
Connecting to the Living Graph…

custom silicon

30 articles about custom silicon in AI news

Qualcomm Ships Hyperscaler Custom Silicon by December 2026

Qualcomm is developing custom silicon for an unnamed hyperscaler, with shipments expected December 2026, marking its most concrete data-center comeback move.

70% relevant

Qualcomm Builds Dedicated CPU for Agentic AI, Enters Hyperscale Silicon Market

Qualcomm CEO revealed dedicated CPU for agentic AI, custom silicon deal with hyperscaler shipping Dec 2026, and agentic smartphones. Pivot challenges GPU-centric AI infrastructure consensus.

100% relevant

Nvidia Invests $2B in Marvell to Deepen NVLink Fusion Tie-Up

Nvidia invested $2B in Marvell to deepen NVLink Fusion partnership, integrating Marvell custom silicon into AI interconnect fabric.

85% relevant

Google Splits TPU Line: 8t for Training, 8i for Inference

At Cloud Next 2026, Google introduced two new AI chips — TPU 8t for training and TPU 8i for inference — splitting its custom silicon for the first time. OpenAI, Anthropic, and Meta are buying multi-gigawatt TPU capacity, signaling a crack in NVIDIA's 81% market share.

100% relevant

Google, Marvell in Talks to Co-Develop New AI Chips, Including TPU-Optimized MPU

Google is reportedly in talks with Marvell Technology to co-develop two new AI chips: a memory processing unit (MPU) to pair with TPUs and a new, optimized TPU. This move is a direct effort to bolster Google's custom silicon stack and compete with Nvidia's dominance.

95% relevant

Meta Expands Broadcom Partnership for Next-Gen AI Infrastructure

Meta is expanding its partnership with semiconductor giant Broadcom to co-develop its next-generation AI infrastructure. This move signals a continued, long-term commitment to custom silicon for AI training and inference.

85% relevant

SemiAnalysis: NVIDIA's Customer Data Drives Disaggregated Inference, LPU Surpasses GPU

SemiAnalysis states NVIDIA's direct customer feedback is leading the industry toward disaggregated inference architectures. In this model, specialized LPUs can outperform GPUs for specific pipeline tasks.

85% relevant

MLX-Benchmark Suite Launches as First Comprehensive LLM Eval for Apple Silicon

The MLX-Benchmark Suite has been released as the first comprehensive evaluation framework for Large Language Models running on Apple's MLX framework. It provides standardized metrics for models optimized for Apple Silicon hardware.

85% relevant

MLX-VLM Adds Continuous Batching, OpenAI API, and Vision Cache for Apple Silicon

The next release of MLX-VLM will introduce continuous batching, an OpenAI-compatible API, and vision feature caching for multimodal models running locally on Apple Silicon. These optimizations promise up to 228x speedups on cache hits for models like Gemma4.

95% relevant

Anthropic Considers Custom AI Chips, Following Google & OpenAI

Anthropic is reportedly considering developing custom AI chips, a strategic move to gain control over its compute infrastructure and reduce costs. This follows similar initiatives by Google, Amazon, and OpenAI.

85% relevant

Qwen3-TTS Added to mlx-tune, Enabling Full Qwen Model Fine-Tuning on Apple Silicon Macs

The mlx-tune library now supports Qwen3-TTS, making the entire Qwen model stack—including the new text-to-speech model—fine-tunable on Apple Silicon Macs. This expands local AI development options for researchers and developers.

85% relevant

Nvidia's Jensen Huang Dismisses Custom AI Chip Threat: 'Science Projects' Versus 'AI Factories'

Nvidia CEO Jensen Huang confidently dismissed concerns about custom AI chips challenging Nvidia's dominance, framing competitors' efforts as 'science projects' while Nvidia builds revenue-generating 'AI factories' with a complete platform approach.

85% relevant

RunAnywhere's MetalRT Engine Delivers Breakthrough AI Performance on Apple Silicon

RunAnywhere has launched MetalRT, a proprietary GPU inference engine that dramatically accelerates on-device AI workloads on Apple Silicon. Their open-source RCLI tool demonstrates sub-200ms voice AI pipelines, outperforming existing solutions like llama.cpp and Apple's MLX.

80% relevant

Apple Reportedly Developing 'Balta' AI ASIC for Cloud Compute

A Morgan Stanley report indicates Apple is accelerating development of a custom ASIC, codenamed 'Balta,' for AI cloud and hybrid compute. This marks Apple's first known move to design silicon for its data centers, not just consumer devices.

85% relevant

Developer Achieves 395x RTFx on M5 Max with Fastest Parakeet v3 for Apple ANE

Developer @mweinbach has optimized the Parakeet v3 speech recognition model for Apple's Neural Engine, achieving a 395x real-time factor on an M5 Max chip. This represents a significant performance leap for on-device AI inference on Apple Silicon.

87% relevant

Qwen2.5-7B-Instruct 4-bit DWQ Model Released for Apple MLX

A developer has ported a 4-bit quantized Qwen2.5-7B-Instruct model to Apple's MLX framework. This makes the capable 7B model more efficient to run on Apple Silicon Macs.

77% relevant

Nvidia Invests $2B in Marvell to Expand NVLink Fusion Chip Partnership

Nvidia is investing $2 billion in Marvell Technology to deepen their partnership on NVLink Fusion, a chip-to-chip interconnect crucial for scaling AI training clusters. This strategic move aims to secure supply and accelerate development of high-bandwidth links between GPUs and custom AI accelerators.

84% relevant

Aehr Test Systems Lands $41M AI Chip Order; H2 Bookings Top $92M

Aehr Test Systems received a record $41 million production order from a key hyperscale AI customer. Total bookings for the second half of its fiscal year exceeded $92 million, highlighting surging demand for semiconductor test and burn-in equipment.

74% relevant

Technical Implementation: Building a Local Fine-Tuning Engine with MLX

A developer shares a backend implementation guide for automating the fine-tuning process of AI models using Apple's MLX framework. This enables private, on-device model customization without cloud dependencies, which is crucial for handling sensitive data.

78% relevant

AWS CEO: All Latest Anthropic Models Trained on Amazon Trainium

Amazon Web Services CEO Matt Garman stated that all of Anthropic's latest AI models are trained on AWS's custom Trainium chips. This confirms the deepening technical and strategic integration between the AI lab and its primary cloud investor.

89% relevant

Intel Joins SpaceX, xAI, Tesla in 'Terafab' Chip Project

Intel announced it is joining the 'Terafab' project alongside SpaceX, xAI, and Tesla. The collaboration aims to refactor silicon fab technology, likely to support the massive compute demands of AI and aerospace.

85% relevant

Open-Source AI Assistant Runs Locally on MacBook Air M4 with 16GB RAM, No API Keys Required

A developer showcased a complete AI assistant running entirely on a MacBook Air M4 with 16GB RAM, using open-source models with no cloud API calls. This demonstrates the feasibility of capable local AI on consumer-grade Apple Silicon hardware.

93% relevant

Roboflow's RF-DETR Model Ported to Apple MLX, Enabling Real-Time On-Device Instance Segmentation

Roboflow's RF-DETR object detection model is now available on Apple's MLX framework, enabling real-time instance segmentation on Apple Silicon devices. This port unlocks new on-device visual analysis applications for robotics and augmented vision-language models.

89% relevant

Ollama Now Supports Apple MLX Backend for Local LLM Inference on macOS

Ollama, the popular framework for running large language models locally, has added support for Apple's MLX framework as a backend. This enables more efficient execution of models like Llama 3.2 and Mistral on Apple Silicon Macs.

85% relevant

Google's $1.9 Trillion Vertical Integration Strategy: Building an AI Empire from Chips to Power Grid

Google is investing $1.9 trillion over the next decade to control every layer of the AI stack, from custom TPU chips to power infrastructure. This vertical integration strategy creates a competitive moat that could reshape the entire AI industry landscape.

95% relevant

Diffusion Architecture Breaks Speed Barrier: Inception's Mercury 2 Hits 1,000 Tokens/Second

Inception's Mercury 2 achieves unprecedented text generation speeds of 1,000 tokens per second using diffusion architecture borrowed from image AI. This represents a 10x speed advantage over leading models like Claude 4.5 Haiku and GPT-5 Mini without requiring custom hardware.

95% relevant

NVIDIA Vera Rubin VR NVL72: Value Extraction Engine Arrives

NVIDIA's Vera Rubin VR NVL72 shifts from value vendor to value extractor, targeting TCO. SemiAnalysis argues this overturns prior pricing paradigm.

87% relevant

ODMs Evolve from Manufacturers to AI Infrastructure Partners

ODMs shift from manufacturing to design/integration partners for AI racks, driven by GPU/ASIC complexity and liquid cooling.

75% relevant

Google-Anthropic 5 GW Deal: AI Capacity Pre-Sold at Gigawatt Scale

Google and Anthropic signed a 5 GW compute deal, pre-selling AI capacity at gigawatt scale and reshaping infrastructure financing.

100% relevant

AI Data Centers Face 220GW Grid Jam, Power Infrastructure Becomes Bottleneck

PJM's 220GW interconnection queue shows AI data center growth is now constrained by power grid capacity, not compute. Hyperscalers face 3-7 year delays.

75% relevant