gpus
30 articles about gpus in AI news
mlx-vlm v0.6.2 Adds Gemma 4 QAT Support for Local GPUs
mlx-vlm v0.6.2 adds launch-day support for Google DeepMind's Gemma 4 QAT checkpoints, enabling local inference on consumer GPUs and edge devices with video input for the 12B model.
Nvidia Networking Revenue Hits $14.8B, Up 199% as AI Spending Shifts Beyond GPUs
Nvidia's Q1 FY2027 networking revenue surged 199% to $14.8B, signaling AI infrastructure spending is moving beyond GPUs into full-system networking. New reporting splits into Hyperscale and ACIE segments reflect a broadening customer base beyond hyperscalers.
Nscale to Deploy 66K+ Rubin GPUs for Microsoft in Portugal
Nscale will deploy 66,000+ NVIDIA Rubin GPUs for Microsoft at Portugal's Start Campus. The deal is a first for Rubin and signals Microsoft's geographic diversification.
Jensen Huang's 30-Year TSMC Battle: From 3D Graphics to AI GPUs
A 30-year-old comic shows Jensen Huang convincing TSMC to supply wafers for 3D graphics chips. Today, he's still fighting for wafer supply, but now for AI GPUs, alongside Broadcom, AMD, MediaTek, and Amazon.
A Practical Guide to Fine-Tuning an LLM on RunPod H100 GPUs with QLoRA
The source is a technical tutorial on using QLoRA for parameter-efficient fine-tuning of an LLM, leveraging RunPod's cloud H100 GPUs. It focuses on the practical setup and execution steps for engineers.
Intel, SambaNova Blueprint Pairs GPUs for AI Prefill, RDUs for Decoding
Intel and SambaNova Systems have outlined a new inference architecture for agentic AI workloads. It splits tasks between GPUs for 'prefill' and SambaNova's Reconfigurable Dataflow Units (RDUs) for high-throughput token generation.
Cursor AI Claims 1.84x Faster MoE Inference on NVIDIA Blackwell GPUs
Cursor AI announced a rebuilt inference engine for Mixture-of-Experts models on NVIDIA's new Blackwell GPUs, resulting in a claimed 1.84x speedup and improved output accuracy.
Mistral Secures $830M Debt to Build Paris Data Center with 14,000 Nvidia GB300 GPUs
French AI startup Mistral has raised $830 million in debt financing to build and operate a sovereign AI data center near Paris, set to host nearly 14,000 Nvidia GB300 GPUs. The move signals a strategic European push for bespoke AI infrastructure, distinct from the gigawatt-scale builds of US hyperscalers.
CoreWeave, Nebius Earnings Show AI Race Shifts From GPUs to Power
CoreWeave and Nebius Q1 earnings show AI infrastructure race shifting from GPU supply to power and scale, with combined capex guidance exceeding $55B.
Astera Labs Scorpio X-Series Switch Targets 49% Collective IO Cut for Idle GPUs
Astera Labs introduced Scorpio X-Series 320-lane switch targeting 49% collective IO reduction for fragmented AI workloads. Shipments to hyperscalers began, with broad ramp in H2 2026.
Open-Weight 1T Model Inference Margins Hit 88% on Rented GPUs
Renting a 128 GPU cluster to serve a 1T open model yields ~88% margin on tokens sold at $0.002/1K, exposing a structural arbitrage over proprietary APIs.
DARPA Leases 50 Nvidia H100 GPUs for Biological AI Program
DARPA's Biological Technologies Office is procuring 50 Nvidia HGX H100 GPU systems for its NODES program, with hardware delivery required within one month. This represents a significant government investment in AI infrastructure for biological research applications.
Wiwynn Shows First SCADA Server: 2.9PB, No CPU for I/O
Wiwynn showed first Nvidia SCADA server at Computex 2026: 2.9 PB storage, 528M IOPS, GPUs bypass CPU for I/O. Marks shift in AI storage architecture.
Supermicro Shows Vera Rubin NVL72 Rack With New Coolant Type
Supermicro showed Vera Rubin NVL72 rack with new coolant. Rack targets Nvidia Rubin GPUs, ships early 2027.
Dell Ships First Nvidia Vera Rubin NVL72 Rack to CoreWeave
Dell delivered the first Nvidia Vera Rubin NVL72 rack to CoreWeave. Each rack packs 72 Rubin GPUs, 36 Vera CPUs, 3.6 exaFLOPS FP4 inference, 75 TB memory, and 260 TB/s NVLink bandwidth.
Perplexity Claims 3x Blackwell Inference Throughput for 70B Models
Perplexity AI claims 3x inference throughput for 70B models on Nvidia Blackwell GPUs via FP4 and custom scheduling. The gain exceeds Nvidia's own 2x marketing claim.
NVIDIA, DOE Build 100K-GPU Supercomputer for Science
DOE and NVIDIA announced Solstice, a 100K-GPU Vera Rubin supercomputer delivering 5,000 exaflops, and Equinox with 10K Blackwell GPUs.
Anthropic's 220K GPU Cluster: $5B Compute Bet Revealed
Anthropic reportedly has 220K NVIDIA GPUs and 310MW, implying a >$5B compute cluster, 3x OpenAI's largest.
Anthropic Doubles Claude Code Rate Limits, Leases All of SpaceX's Colossus 1
Anthropic doubled Claude Code's 5-hour rate limits and removed peak-hour throttling for Pro, Max, Team, and seat-based Enterprise plans, then disclosed the source of the new capacity: a lease on the entire Colossus 1 data center — 300 MW and ~220,000 NVIDIA GPUs in Memphis — that SpaceX absorbed when it took over xAI.
JPMorgan: Agentic AI Could Flip Server Ratio to CPU-Heavy
JPMorgan reports that agentic AI workloads could increase CPU demand, potentially flipping the GPU-to-CPU ratio from 7-8 GPUs per CPU to CPU-heavy deployments, with a $100B TAM for AI CPU infrastructure.
Meta Deploys Millions of Amazon Graviton CPUs for AI Agents
Meta will deploy tens of millions of AWS Graviton5 CPU cores for AI agent workloads, signaling that agentic inference favors CPUs over GPUs. The deal deepens Meta's $200B+ infrastructure push amid layoffs and cloud rivalry.
SemiAnalysis: NVIDIA's Customer Data Drives Disaggregated Inference, LPU Surpasses GPU
SemiAnalysis states NVIDIA's direct customer feedback is leading the industry toward disaggregated inference architectures. In this model, specialized LPUs can outperform GPUs for specific pipeline tasks.
Nvidia Invests $2B in Marvell to Expand NVLink Fusion Chip Partnership
Nvidia is investing $2 billion in Marvell Technology to deepen their partnership on NVLink Fusion, a chip-to-chip interconnect crucial for scaling AI training clusters. This strategic move aims to secure supply and accelerate development of high-bandwidth links between GPUs and custom AI accelerators.
Mac Studio AI Hardware Shortage Signals Shift to Cloud Rentals
Developers report a global shortage of high-memory Apple Silicon Macs, with 128GB Mac Studios unavailable worldwide. This pushes practitioners toward renting cloud H100 GPUs at ~$3/hr, marking a shift from the recent local AI trend.
NVIDIA's cuQuantum-DGX OS Aims to Manage Hybrid Quantum-Classical Workflows
NVIDIA announced its AI software stack is evolving into an operating system for quantum computing, aiming to manage the complex workflow between quantum processors and classical GPUs. This targets a major integration bottleneck as quantum hardware scales.
Hugging Face OCRs 27,000 arXiv Papers to Markdown with Open 5B Model
Hugging Face CEO Clement Delangue announced the OCR conversion of 27,000 arXiv papers to Markdown using an open 5B-parameter model and 16 parallel jobs on L40S GPUs. This demonstrates a scalable, open-source pipeline for large-scale academic document processing.
Microsoft's BitNet Enables 100B-Parameter LLMs on CPU, Cuts Energy 82%
Microsoft Research's BitNet project demonstrates 1-bit LLMs with 100B parameters that run efficiently on CPUs, using 82% less energy while maintaining performance, challenging the need for GPUs in local deployment.
Google's 5M H100-Equivalent GPU Fleet Powers Anthropic's AI Expansion
An analyst estimates Google's compute capacity at ~5 million Nvidia H100-equivalent GPUs, providing the infrastructure backbone for Anthropic's model deployment and growth. This highlights the strategic shift where foundational AI labs rely on hyperscaler scale for distribution.
Nvidia DLSS 4.5 Launches with Enhanced AI Frame Generation and Ray Reconstruction
Nvidia has released DLSS 4.5, a major update to its AI-powered upscaling technology featuring new frame generation modes and improved ray reconstruction. The update is available now for GeForce RTX 40 and 50 Series GPUs.
Google's TurboQuant Compresses LLM KV Cache 6x with Zero Accuracy Loss, Cutting GPU Memory by 80%
Google researchers introduced TurboQuant, a method that compresses LLM KV cache from 32-bit to 3-bit precision without accuracy degradation. This reduces GPU memory consumption by over 80% and speeds up inference 8x on H100 GPUs.