Skip to content
gentic.news — AI News Intelligence Platform
Connecting to the Living Graph…

gpus

30 articles about gpus in AI news

mlx-vlm v0.6.2 Adds Gemma 4 QAT Support for Local GPUs

mlx-vlm v0.6.2 adds launch-day support for Google DeepMind's Gemma 4 QAT checkpoints, enabling local inference on consumer GPUs and edge devices with video input for the 12B model.

100% relevant

Nvidia Networking Revenue Hits $14.8B, Up 199% as AI Spending Shifts Beyond GPUs

Nvidia's Q1 FY2027 networking revenue surged 199% to $14.8B, signaling AI infrastructure spending is moving beyond GPUs into full-system networking. New reporting splits into Hyperscale and ACIE segments reflect a broadening customer base beyond hyperscalers.

100% relevant

Nscale to Deploy 66K+ Rubin GPUs for Microsoft in Portugal

Nscale will deploy 66,000+ NVIDIA Rubin GPUs for Microsoft at Portugal's Start Campus. The deal is a first for Rubin and signals Microsoft's geographic diversification.

80% relevant

Jensen Huang's 30-Year TSMC Battle: From 3D Graphics to AI GPUs

A 30-year-old comic shows Jensen Huang convincing TSMC to supply wafers for 3D graphics chips. Today, he's still fighting for wafer supply, but now for AI GPUs, alongside Broadcom, AMD, MediaTek, and Amazon.

75% relevant

A Practical Guide to Fine-Tuning an LLM on RunPod H100 GPUs with QLoRA

The source is a technical tutorial on using QLoRA for parameter-efficient fine-tuning of an LLM, leveraging RunPod's cloud H100 GPUs. It focuses on the practical setup and execution steps for engineers.

76% relevant

Intel, SambaNova Blueprint Pairs GPUs for AI Prefill, RDUs for Decoding

Intel and SambaNova Systems have outlined a new inference architecture for agentic AI workloads. It splits tasks between GPUs for 'prefill' and SambaNova's Reconfigurable Dataflow Units (RDUs) for high-throughput token generation.

85% relevant

Cursor AI Claims 1.84x Faster MoE Inference on NVIDIA Blackwell GPUs

Cursor AI announced a rebuilt inference engine for Mixture-of-Experts models on NVIDIA's new Blackwell GPUs, resulting in a claimed 1.84x speedup and improved output accuracy.

85% relevant

Mistral Secures $830M Debt to Build Paris Data Center with 14,000 Nvidia GB300 GPUs

French AI startup Mistral has raised $830 million in debt financing to build and operate a sovereign AI data center near Paris, set to host nearly 14,000 Nvidia GB300 GPUs. The move signals a strategic European push for bespoke AI infrastructure, distinct from the gigawatt-scale builds of US hyperscalers.

90% relevant

CoreWeave, Nebius Earnings Show AI Race Shifts From GPUs to Power

CoreWeave and Nebius Q1 earnings show AI infrastructure race shifting from GPU supply to power and scale, with combined capex guidance exceeding $55B.

90% relevant

Astera Labs Scorpio X-Series Switch Targets 49% Collective IO Cut for Idle GPUs

Astera Labs introduced Scorpio X-Series 320-lane switch targeting 49% collective IO reduction for fragmented AI workloads. Shipments to hyperscalers began, with broad ramp in H2 2026.

92% relevant

Open-Weight 1T Model Inference Margins Hit 88% on Rented GPUs

Renting a 128 GPU cluster to serve a 1T open model yields ~88% margin on tokens sold at $0.002/1K, exposing a structural arbitrage over proprietary APIs.

85% relevant

DARPA Leases 50 Nvidia H100 GPUs for Biological AI Program

DARPA's Biological Technologies Office is procuring 50 Nvidia HGX H100 GPU systems for its NODES program, with hardware delivery required within one month. This represents a significant government investment in AI infrastructure for biological research applications.

86% relevant

Wiwynn Shows First SCADA Server: 2.9PB, No CPU for I/O

Wiwynn showed first Nvidia SCADA server at Computex 2026: 2.9 PB storage, 528M IOPS, GPUs bypass CPU for I/O. Marks shift in AI storage architecture.

89% relevant

Supermicro Shows Vera Rubin NVL72 Rack With New Coolant Type

Supermicro showed Vera Rubin NVL72 rack with new coolant. Rack targets Nvidia Rubin GPUs, ships early 2027.

73% relevant

Dell Ships First Nvidia Vera Rubin NVL72 Rack to CoreWeave

Dell delivered the first Nvidia Vera Rubin NVL72 rack to CoreWeave. Each rack packs 72 Rubin GPUs, 36 Vera CPUs, 3.6 exaFLOPS FP4 inference, 75 TB memory, and 260 TB/s NVLink bandwidth.

100% relevant

Perplexity Claims 3x Blackwell Inference Throughput for 70B Models

Perplexity AI claims 3x inference throughput for 70B models on Nvidia Blackwell GPUs via FP4 and custom scheduling. The gain exceeds Nvidia's own 2x marketing claim.

85% relevant

NVIDIA, DOE Build 100K-GPU Supercomputer for Science

DOE and NVIDIA announced Solstice, a 100K-GPU Vera Rubin supercomputer delivering 5,000 exaflops, and Equinox with 10K Blackwell GPUs.

80% relevant

Anthropic's 220K GPU Cluster: $5B Compute Bet Revealed

Anthropic reportedly has 220K NVIDIA GPUs and 310MW, implying a >$5B compute cluster, 3x OpenAI's largest.

100% relevant

Anthropic Doubles Claude Code Rate Limits, Leases All of SpaceX's Colossus 1

Anthropic doubled Claude Code's 5-hour rate limits and removed peak-hour throttling for Pro, Max, Team, and seat-based Enterprise plans, then disclosed the source of the new capacity: a lease on the entire Colossus 1 data center — 300 MW and ~220,000 NVIDIA GPUs in Memphis — that SpaceX absorbed when it took over xAI.

100% relevant

JPMorgan: Agentic AI Could Flip Server Ratio to CPU-Heavy

JPMorgan reports that agentic AI workloads could increase CPU demand, potentially flipping the GPU-to-CPU ratio from 7-8 GPUs per CPU to CPU-heavy deployments, with a $100B TAM for AI CPU infrastructure.

96% relevant

Meta Deploys Millions of Amazon Graviton CPUs for AI Agents

Meta will deploy tens of millions of AWS Graviton5 CPU cores for AI agent workloads, signaling that agentic inference favors CPUs over GPUs. The deal deepens Meta's $200B+ infrastructure push amid layoffs and cloud rivalry.

96% relevant

SemiAnalysis: NVIDIA's Customer Data Drives Disaggregated Inference, LPU Surpasses GPU

SemiAnalysis states NVIDIA's direct customer feedback is leading the industry toward disaggregated inference architectures. In this model, specialized LPUs can outperform GPUs for specific pipeline tasks.

85% relevant

Nvidia Invests $2B in Marvell to Expand NVLink Fusion Chip Partnership

Nvidia is investing $2 billion in Marvell Technology to deepen their partnership on NVLink Fusion, a chip-to-chip interconnect crucial for scaling AI training clusters. This strategic move aims to secure supply and accelerate development of high-bandwidth links between GPUs and custom AI accelerators.

84% relevant

Mac Studio AI Hardware Shortage Signals Shift to Cloud Rentals

Developers report a global shortage of high-memory Apple Silicon Macs, with 128GB Mac Studios unavailable worldwide. This pushes practitioners toward renting cloud H100 GPUs at ~$3/hr, marking a shift from the recent local AI trend.

85% relevant

NVIDIA's cuQuantum-DGX OS Aims to Manage Hybrid Quantum-Classical Workflows

NVIDIA announced its AI software stack is evolving into an operating system for quantum computing, aiming to manage the complex workflow between quantum processors and classical GPUs. This targets a major integration bottleneck as quantum hardware scales.

85% relevant

Hugging Face OCRs 27,000 arXiv Papers to Markdown with Open 5B Model

Hugging Face CEO Clement Delangue announced the OCR conversion of 27,000 arXiv papers to Markdown using an open 5B-parameter model and 16 parallel jobs on L40S GPUs. This demonstrates a scalable, open-source pipeline for large-scale academic document processing.

85% relevant

Microsoft's BitNet Enables 100B-Parameter LLMs on CPU, Cuts Energy 82%

Microsoft Research's BitNet project demonstrates 1-bit LLMs with 100B parameters that run efficiently on CPUs, using 82% less energy while maintaining performance, challenging the need for GPUs in local deployment.

95% relevant

Google's 5M H100-Equivalent GPU Fleet Powers Anthropic's AI Expansion

An analyst estimates Google's compute capacity at ~5 million Nvidia H100-equivalent GPUs, providing the infrastructure backbone for Anthropic's model deployment and growth. This highlights the strategic shift where foundational AI labs rely on hyperscaler scale for distribution.

85% relevant

Nvidia DLSS 4.5 Launches with Enhanced AI Frame Generation and Ray Reconstruction

Nvidia has released DLSS 4.5, a major update to its AI-powered upscaling technology featuring new frame generation modes and improved ray reconstruction. The update is available now for GeForce RTX 40 and 50 Series GPUs.

85% relevant

Google's TurboQuant Compresses LLM KV Cache 6x with Zero Accuracy Loss, Cutting GPU Memory by 80%

Google researchers introduced TurboQuant, a method that compresses LLM KV cache from 32-bit to 3-bit precision without accuracy degradation. This reduces GPU memory consumption by over 80% and speeds up inference 8x on H100 GPUs.

97% relevant