gpu
30 articles about gpu in AI news
mlx-vlm v0.6.2 Adds Gemma 4 QAT Support for Local GPUs
mlx-vlm v0.6.2 adds launch-day support for Google DeepMind's Gemma 4 QAT checkpoints, enabling local inference on consumer GPUs and edge devices with video input for the 12B model.
Cerebras Hits 981 Tokens/sec on 1T-Parameter Kimi K2.6, Claims 6.7× GPU Cloud Speedup
Cerebras reported 981 tokens/sec on the 1T-parameter Kimi K2.6 model, a 6.7× speedup over the next GPU cloud, validated by an independent third party.
Nvidia Networking Revenue Hits $14.8B, Up 199% as AI Spending Shifts Beyond GPUs
Nvidia's Q1 FY2027 networking revenue surged 199% to $14.8B, signaling AI infrastructure spending is moving beyond GPUs into full-system networking. New reporting splits into Hyperscale and ACIE segments reflect a broadening customer base beyond hyperscalers.
train-llm-from-scratch: 1B-Parameter LLM on a Single GPU
train-llm-from-scratch trains billion-parameter LLMs on a single GPU, cutting costs from $10M+ to consumer hardware.
vLLM Optimizations Cut Voice AI Latency by 40% on 6-GPU Cluster
vLLM optimizations on a 6-GPU cluster reduced voice AI latency by 40% for a Qwen-based system, enabling 500 concurrent sessions per node without hardware upgrades.
CoreWeave, Nebius Earnings Show AI Race Shifts From GPUs to Power
CoreWeave and Nebius Q1 earnings show AI infrastructure race shifting from GPU supply to power and scale, with combined capex guidance exceeding $55B.
Cerebras IPO Challenges GPU Scaling Orthodoxy
Cerebras filed for IPO on April 21, betting wafer-scale chips can disrupt Nvidia's GPU cluster model for AI workloads.
MLX CUDA Backend Passes All Tests, Closing Apple GPU Gap
MLX CUDA backend passes all tests, enabling NVIDIA GPU support. Milestone bridges Apple Silicon and CUDA ecosystems for ML workloads.
NHN Deploys 7,656-GPU AI Cluster in Seoul
NHN launched a 7,656-GPU cluster in Seoul, South Korea, for domestic enterprise AI workloads. The cluster targets inference and training, competing with Naver and Kakao.
VS Code Now Connects Directly to Google Colab With Free T4 GPU
Google Colab integrates with VS Code, offering a free T4 GPU inside the editor, bypassing cloud GPU providers.
Detecting AI Images: Metadata Exposes Generators, No GPU Needed
AI image detection via metadata analysis exposes generators like Google's Gemini and Meta's Llama without GPU clusters, highlighting a simple but effective method.
AMD Launches PCIe GPU for AI Workloads, Targets Existing Server Install Base
AMD launched a PCIe-based GPU for AI workloads, targeting existing servers. The card provides immediate boost without new data center buildouts.
Kunluncore Files STAR Market IPO, Claims 32K GPU Cluster First
Kunluncore filed for a STAR Market IPO, claiming a 32K GPU cluster first, testing investor appetite for domestic AI chips.
NVIDIA, DOE Build 100K-GPU Supercomputer for Science
DOE and NVIDIA announced Solstice, a 100K-GPU Vera Rubin supercomputer delivering 5,000 exaflops, and Equinox with 10K Blackwell GPUs.
Anthropic's 220K GPU Cluster: $5B Compute Bet Revealed
Anthropic reportedly has 220K NVIDIA GPUs and 310MW, implying a >$5B compute cluster, 3x OpenAI's largest.
OpenAI's MRC Protocol Sprays Packets Across 100+ Paths to Fix GPU Stragglers
OpenAI open-sourced MRC, a networking protocol that sprays packets across hundreds of paths to reduce GPU idle time from congestion and failures, contributed to OCP.
Nscale to Deploy 66K+ Rubin GPUs for Microsoft in Portugal
Nscale will deploy 66,000+ NVIDIA Rubin GPUs for Microsoft at Portugal's Start Campus. The deal is a first for Rubin and signals Microsoft's geographic diversification.
NVIDIA Feynman GPU Power Semi Content Hits $191K, 17× Blackwell
NVIDIA Feynman GPUs require $191K in power semiconductors per system, 17× Blackwell, driven by 800V DC architecture shift.
RoundPipe: Full Fine-Tune 32B Models on a Single 24GB GPU
RoundPipe fine-tunes 32B models on a single 24GB GPU with 1.5-2.2× speedups via round-robin pipeline dispatch.
Open-Weight 1T Model Inference Margins Hit 88% on Rented GPUs
Renting a 128 GPU cluster to serve a 1T open model yields ~88% margin on tokens sold at $0.002/1K, exposing a structural arbitrage over proprietary APIs.
Moore Threads Q1 Revenue Up, Building 100K-GPU AI Cluster
Moore Threads reports Q1 2026 revenue growth and confirms progress building a 100,000-GPU cluster for AI training, signaling growing domestic AI infrastructure in China despite US export controls.
Jensen Huang's 30-Year TSMC Battle: From 3D Graphics to AI GPUs
A 30-year-old comic shows Jensen Huang convincing TSMC to supply wafers for 3D graphics chips. Today, he's still fighting for wafer supply, but now for AI GPUs, alongside Broadcom, AMD, MediaTek, and Amazon.
SemiAnalysis: NVIDIA's Customer Data Drives Disaggregated Inference, LPU Surpasses GPU
SemiAnalysis states NVIDIA's direct customer feedback is leading the industry toward disaggregated inference architectures. In this model, specialized LPUs can outperform GPUs for specific pipeline tasks.
DARPA Leases 50 Nvidia H100 GPUs for Biological AI Program
DARPA's Biological Technologies Office is procuring 50 Nvidia HGX H100 GPU systems for its NODES program, with hardware delivery required within one month. This represents a significant government investment in AI infrastructure for biological research applications.
Cisco Reveals Scale-Across GPU Networking Needs 14x DCI Bandwidth
Cisco's chief architect detailed the massive bandwidth requirements for connecting AI clusters via 'scale-across' GPU networking, which needs 14x the capacity of traditional data center interconnects. This shift is creating a multi-billion dollar market for 800G coherent pluggables and deep-buffered switches.
LeWorldModel Solves JEPA Collapse with 15M Params, Trains on Single GPU
Researchers published LeWorldModel, solving the representation collapse problem in Yann LeCun's JEPA architecture. The 15M-parameter model trains on a single GPU and demonstrates intrinsic physics understanding.
Gur Singh Claims 7 M4 MacBooks Match A100, Calls Cloud GPU Training a 'Scam'
Developer Gur Singh posted that seven M4 MacBooks (2.9 TFLOPS each) match an NVIDIA A100's performance, calling cloud GPU training a 'scam' and advocating for distributed, consumer-hardware approaches.
Claude MCP GPU Debugging: AI Agent Identifies PyTorch Bottleneck in Kernel
A developer used an AI agent powered by Claude Code and the Model Context Protocol (MCP) to diagnose a severe GPU performance bottleneck. The agent analyzed system kernel traces, pinpointing excessive CPU context switches as the culprit, demonstrating a practical application of agentic AI for complex technical debugging.
Hugging Face Launches 'Kernels' Hub for GPU Code, Like GitHub for AI Hardware
Hugging Face has launched 'Kernels,' a new section on its Hub for sharing and discovering optimized GPU kernels. This treats performance-critical code as a first-class artifact, similar to AI models.
AI Compute Crisis: GPU Prices Up 48%, Anthropic API at 98.95% Uptime
The AI industry faces a severe compute capacity crisis, with GPU prices up 48%, Anthropic API uptime falling to 98.95%, and OpenAI shutting down Sora to reallocate resources. Demand for agentic AI is outstripping supply, forcing rationing and product cancellations.