Skip to content
gentic.news — AI News Intelligence Platform
Connecting to the Living Graph…

gpu

30 articles about gpu in AI news

mlx-vlm v0.6.2 Adds Gemma 4 QAT Support for Local GPUs

mlx-vlm v0.6.2 adds launch-day support for Google DeepMind's Gemma 4 QAT checkpoints, enabling local inference on consumer GPUs and edge devices with video input for the 12B model.

100% relevant

Cerebras Hits 981 Tokens/sec on 1T-Parameter Kimi K2.6, Claims 6.7× GPU Cloud Speedup

Cerebras reported 981 tokens/sec on the 1T-parameter Kimi K2.6 model, a 6.7× speedup over the next GPU cloud, validated by an independent third party.

93% relevant

Nvidia Networking Revenue Hits $14.8B, Up 199% as AI Spending Shifts Beyond GPUs

Nvidia's Q1 FY2027 networking revenue surged 199% to $14.8B, signaling AI infrastructure spending is moving beyond GPUs into full-system networking. New reporting splits into Hyperscale and ACIE segments reflect a broadening customer base beyond hyperscalers.

100% relevant

train-llm-from-scratch: 1B-Parameter LLM on a Single GPU

train-llm-from-scratch trains billion-parameter LLMs on a single GPU, cutting costs from $10M+ to consumer hardware.

85% relevant

vLLM Optimizations Cut Voice AI Latency by 40% on 6-GPU Cluster

vLLM optimizations on a 6-GPU cluster reduced voice AI latency by 40% for a Qwen-based system, enabling 500 concurrent sessions per node without hardware upgrades.

82% relevant

CoreWeave, Nebius Earnings Show AI Race Shifts From GPUs to Power

CoreWeave and Nebius Q1 earnings show AI infrastructure race shifting from GPU supply to power and scale, with combined capex guidance exceeding $55B.

90% relevant

Cerebras IPO Challenges GPU Scaling Orthodoxy

Cerebras filed for IPO on April 21, betting wafer-scale chips can disrupt Nvidia's GPU cluster model for AI workloads.

98% relevant

MLX CUDA Backend Passes All Tests, Closing Apple GPU Gap

MLX CUDA backend passes all tests, enabling NVIDIA GPU support. Milestone bridges Apple Silicon and CUDA ecosystems for ML workloads.

77% relevant

NHN Deploys 7,656-GPU AI Cluster in Seoul

NHN launched a 7,656-GPU cluster in Seoul, South Korea, for domestic enterprise AI workloads. The cluster targets inference and training, competing with Naver and Kakao.

90% relevant

VS Code Now Connects Directly to Google Colab With Free T4 GPU

Google Colab integrates with VS Code, offering a free T4 GPU inside the editor, bypassing cloud GPU providers.

91% relevant

Detecting AI Images: Metadata Exposes Generators, No GPU Needed

AI image detection via metadata analysis exposes generators like Google's Gemini and Meta's Llama without GPU clusters, highlighting a simple but effective method.

75% relevant

AMD Launches PCIe GPU for AI Workloads, Targets Existing Server Install Base

AMD launched a PCIe-based GPU for AI workloads, targeting existing servers. The card provides immediate boost without new data center buildouts.

90% relevant

Kunluncore Files STAR Market IPO, Claims 32K GPU Cluster First

Kunluncore filed for a STAR Market IPO, claiming a 32K GPU cluster first, testing investor appetite for domestic AI chips.

85% relevant

NVIDIA, DOE Build 100K-GPU Supercomputer for Science

DOE and NVIDIA announced Solstice, a 100K-GPU Vera Rubin supercomputer delivering 5,000 exaflops, and Equinox with 10K Blackwell GPUs.

80% relevant

Anthropic's 220K GPU Cluster: $5B Compute Bet Revealed

Anthropic reportedly has 220K NVIDIA GPUs and 310MW, implying a >$5B compute cluster, 3x OpenAI's largest.

100% relevant

OpenAI's MRC Protocol Sprays Packets Across 100+ Paths to Fix GPU Stragglers

OpenAI open-sourced MRC, a networking protocol that sprays packets across hundreds of paths to reduce GPU idle time from congestion and failures, contributed to OCP.

88% relevant

Nscale to Deploy 66K+ Rubin GPUs for Microsoft in Portugal

Nscale will deploy 66,000+ NVIDIA Rubin GPUs for Microsoft at Portugal's Start Campus. The deal is a first for Rubin and signals Microsoft's geographic diversification.

80% relevant

NVIDIA Feynman GPU Power Semi Content Hits $191K, 17× Blackwell

NVIDIA Feynman GPUs require $191K in power semiconductors per system, 17× Blackwell, driven by 800V DC architecture shift.

95% relevant

RoundPipe: Full Fine-Tune 32B Models on a Single 24GB GPU

RoundPipe fine-tunes 32B models on a single 24GB GPU with 1.5-2.2× speedups via round-robin pipeline dispatch.

85% relevant

Open-Weight 1T Model Inference Margins Hit 88% on Rented GPUs

Renting a 128 GPU cluster to serve a 1T open model yields ~88% margin on tokens sold at $0.002/1K, exposing a structural arbitrage over proprietary APIs.

85% relevant

Moore Threads Q1 Revenue Up, Building 100K-GPU AI Cluster

Moore Threads reports Q1 2026 revenue growth and confirms progress building a 100,000-GPU cluster for AI training, signaling growing domestic AI infrastructure in China despite US export controls.

74% relevant

Jensen Huang's 30-Year TSMC Battle: From 3D Graphics to AI GPUs

A 30-year-old comic shows Jensen Huang convincing TSMC to supply wafers for 3D graphics chips. Today, he's still fighting for wafer supply, but now for AI GPUs, alongside Broadcom, AMD, MediaTek, and Amazon.

75% relevant

SemiAnalysis: NVIDIA's Customer Data Drives Disaggregated Inference, LPU Surpasses GPU

SemiAnalysis states NVIDIA's direct customer feedback is leading the industry toward disaggregated inference architectures. In this model, specialized LPUs can outperform GPUs for specific pipeline tasks.

85% relevant

DARPA Leases 50 Nvidia H100 GPUs for Biological AI Program

DARPA's Biological Technologies Office is procuring 50 Nvidia HGX H100 GPU systems for its NODES program, with hardware delivery required within one month. This represents a significant government investment in AI infrastructure for biological research applications.

86% relevant

Cisco Reveals Scale-Across GPU Networking Needs 14x DCI Bandwidth

Cisco's chief architect detailed the massive bandwidth requirements for connecting AI clusters via 'scale-across' GPU networking, which needs 14x the capacity of traditional data center interconnects. This shift is creating a multi-billion dollar market for 800G coherent pluggables and deep-buffered switches.

85% relevant

LeWorldModel Solves JEPA Collapse with 15M Params, Trains on Single GPU

Researchers published LeWorldModel, solving the representation collapse problem in Yann LeCun's JEPA architecture. The 15M-parameter model trains on a single GPU and demonstrates intrinsic physics understanding.

95% relevant

Gur Singh Claims 7 M4 MacBooks Match A100, Calls Cloud GPU Training a 'Scam'

Developer Gur Singh posted that seven M4 MacBooks (2.9 TFLOPS each) match an NVIDIA A100's performance, calling cloud GPU training a 'scam' and advocating for distributed, consumer-hardware approaches.

77% relevant

Claude MCP GPU Debugging: AI Agent Identifies PyTorch Bottleneck in Kernel

A developer used an AI agent powered by Claude Code and the Model Context Protocol (MCP) to diagnose a severe GPU performance bottleneck. The agent analyzed system kernel traces, pinpointing excessive CPU context switches as the culprit, demonstrating a practical application of agentic AI for complex technical debugging.

72% relevant

Hugging Face Launches 'Kernels' Hub for GPU Code, Like GitHub for AI Hardware

Hugging Face has launched 'Kernels,' a new section on its Hub for sharing and discovering optimized GPU kernels. This treats performance-critical code as a first-class artifact, similar to AI models.

85% relevant

AI Compute Crisis: GPU Prices Up 48%, Anthropic API at 98.95% Uptime

The AI industry faces a severe compute capacity crisis, with GPU prices up 48%, Anthropic API uptime falling to 98.95%, and OpenAI shutting down Sora to reallocate resources. Demand for agentic AI is outstripping supply, forcing rationing and product cancellations.

100% relevant