blackwell
30 articles about blackwell in AI news
NVIDIA NVFP4 on Blackwell Cuts JAX Training by 1.8x in MaxText
NVIDIA NVFP4 on Blackwell achieves 1.8x training speedup over FP8 in JAX/MaxText with no claimed accuracy loss for models up to 70B, but larger-scale validation is needed.
Blackwell NVLink Breaks Confidential Compute, 61% Regression Reported
NVIDIA Blackwell confidential computing disables NVLink multicast, causing 61% regression on SGLang Qwen3.5 397B. Hopper had unencrypted NVLink, compounding the issue.
NVIDIA Vera Rubin NVL72 Cuts Agentic AI Cost 10x vs Blackwell
NVIDIA Vera Rubin NVL72 cuts agentic AI inference cost 10x vs Blackwell, per Huang at Dell event. 5,000 enterprises already on Dell factories.
Perplexity Claims 3x Blackwell Inference Throughput for 70B Models
Perplexity AI claims 3x inference throughput for 70B models on Nvidia Blackwell GPUs via FP4 and custom scheduling. The gain exceeds Nvidia's own 2x marketing claim.
Nvidia Blackwell CLC Boosts GEMM Tile Scheduling by 15% Over Static Persistence
Nvidia Blackwell CLC delivers up to 15% higher GEMM throughput via dynamic persistent tile scheduling, fixing load imbalance without startup overhead.
Unsloth × NVIDIA Cut LLM Fine-Tuning ~25% — Three Glue-Code Wins on Blackwell
Daniel & Michael Han at Unsloth, in collaboration with NVIDIA, published a joint guide quantifying three glue-code optimizations that combine for ~25% faster LLM training on B200 Blackwell hardware. The wins target overhead around the main kernels — caching packed-sequence metadata, double-buffered gradient checkpoint reloads, and a cheaper GPT-OSS MoE router using argsort + bincount. All three are merged via public PRs.
NVIDIA Open-Sources MRC, the RDMA Protocol Powering OpenAI's Blackwell Clusters
NVIDIA open-sourced MRC, a multi-path RDMA protocol used by OpenAI on Blackwell clusters, enabling microsecond rerouting across 64 paths.
NVIDIA Feynman GPU Power Semi Content Hits $191K, 17× Blackwell
NVIDIA Feynman GPUs require $191K in power semiconductors per system, 17× Blackwell, driven by 800V DC architecture shift.
Pyptx: Write Nvidia PTX Kernels in Python for Hopper and Blackwell
Pyptx lets developers write and launch hand-tuned Nvidia PTX kernels directly from Python, supporting Hopper (sm_90a) and Blackwell (sm_100a). It provides explicit control over registers, shared memory, and advanced features like WGMMA and TMA, with dispatch through JAX, PyTorch eager, and torch.compile.
Cursor AI Claims 1.84x Faster MoE Inference on NVIDIA Blackwell GPUs
Cursor AI announced a rebuilt inference engine for Mixture-of-Experts models on NVIDIA's new Blackwell GPUs, resulting in a claimed 1.84x speedup and improved output accuracy.
Nvidia Claims MLPerf Inference v6.0 Records with 288-GPU Blackwell Ultra Systems, Highlights 2.7x Software Gains
MLCommons released MLPerf Inference v6.0 results, introducing multimodal and video model tests. Nvidia set records using 288-GPU Blackwell Ultra systems and achieved a 2.7x performance jump on DeepSeek-R1 via software optimizations alone.
DeepSeek's Blackwell Training Exposes Critical Gaps in US Chip Export Controls
Chinese AI startup DeepSeek reportedly trained its latest model on Nvidia's restricted Blackwell chips, challenging US export controls. The development reveals significant loopholes in semiconductor restrictions amid escalating AI competition.
DeepSeek's Blackwell Gambit: How a Chinese AI Firm Reportedly Circumvented U.S. Chip Export Controls
Chinese AI company DeepSeek reportedly trained its upcoming model using Nvidia's restricted Blackwell chips, potentially clustered in an Inner Mongolia data center. This development highlights the escalating tech rivalry and challenges of enforcing export controls in the AI arms race.
AI Power Shift: How DeepSeek's Alleged Blackwell Chip Access Could Reshape Global AI Race
Chinese AI startup DeepSeek reportedly trained its next major model on Nvidia's banned Blackwell chips, potentially triggering a seismic shift in the AI landscape. US giants Google, OpenAI, and Anthropic are preparing for what could be a market-disrupting release next week.
NVIDIA's Blackwell Ultra Shatters Efficiency Records: 50x Performance Per Watt Leap Redefines AI Economics
NVIDIA's new Blackwell Ultra GB300 NVL72 systems promise a staggering 50x improvement in performance per megawatt and 35x lower cost per token compared to previous Hopper architecture, addressing the critical energy bottleneck in AI scaling.
Anthropic Leases xAI's Colossus 1 After Mixed-Architecture Flaw Blocked
Anthropic leased xAI's 220K-GPU Colossus 1 after its mixed architecture failed to train Grok. Musk builds Blackwell-only Colossus 2 for training and IPO.
NVIDIA, DOE Build 100K-GPU Supercomputer for Science
DOE and NVIDIA announced Solstice, a 100K-GPU Vera Rubin supercomputer delivering 5,000 exaflops, and Equinox with 10K Blackwell GPUs.
We Hosted a 35B LLM on an NVIDIA DGX Spark — A Technical Post-Mortem
A detailed, practical guide to deploying the Qwen3.5–35B model on NVIDIA's GB10 Blackwell hardware. The article serves as a crucial case study on the real-world challenges and solutions for on-premise LLM inference.
Lilly's AI Factory: How a 9,000+ GPU SuperPOD is Rewriting Pharmaceutical Discovery
Eli Lilly has launched 'LillyPod,' the world's most powerful privately-owned AI factory for drug discovery. Powered by NVIDIA's new DGX B300 systems with over 1,000 Blackwell Ultra GPUs, it promises to accelerate medical breakthroughs at unprecedented scale.
Meta's $135 Billion AI Bet: How Confidential Computing Will Transform WhatsApp
Meta commits to buying millions of NVIDIA Blackwell and Rubin GPUs in a landmark partnership, deploying confidential computing technology to bring AI to WhatsApp while protecting user privacy. This represents a major shift in how AI will be integrated into secure messaging platforms.
Meta's Multi-Million GPU Gamble: How a Chip Deal Redefines AI's Future
Meta has signed a massive, multi-year pact with Nvidia to deploy millions of next-generation Blackwell and Rubin GPUs across its data centers. This unprecedented hardware commitment signals a new phase in the AI arms race, where computational scale becomes the primary competitive moat.
Yotta Data Services Seeks $4B Valuation in Pre-IPO Round, Expands India's Largest Nvidia GPU Cluster
Indian data center operator Yotta is raising $500-600M at a ~$4B valuation ahead of an IPO. The firm is scaling its Nvidia H100 and Blackwell (B200/B300) GPU fleet to position itself as a domestic AI infrastructure alternative.
Nvidia, Unitree, Sharpa unveil H2+ humanoid robot reference design
Nvidia, Unitree, and Sharpa released H2+, a humanoid robot reference design, at Computex 2026 to standardize physical AI development workflows.
SemiAnalysis Calls Jensen ComputeX Keynote 'F Tier' Over No AI DC News
SemiAnalysis rated Jensen Huang's ComputeX keynote 'F Tier' for no AI datacenter news and revealed a delayed NVIDIA ARM chip with broken video output.
Nvidia Unveils New Windows SoC, Targeting AI PCs
Nvidia announced a Windows SoC for AI PCs, per @mweinbach. Chip targets on-device inference, competing with Qualcomm and Intel.
Dell Ships First Nvidia Vera Rubin NVL72 Rack to CoreWeave
Dell delivered the first Nvidia Vera Rubin NVL72 rack to CoreWeave. Each rack packs 72 Rubin GPUs, 36 Vera CPUs, 3.6 exaFLOPS FP4 inference, 75 TB memory, and 260 TB/s NVLink bandwidth.
SemiAnalysis: N3 chip demand far outstrips current consensus estimates
SemiAnalysis argues N3 chip demand far exceeds consensus accelerator models, implying a structural silicon shortage not priced by markets.
Cerebras CS4 Stays on 5nm as SRAM Scaling Flattens
Cerebras CS4 stays on 5nm due to SRAM scaling flattening, per @SemiAnalysis_. 3nm offers no density gain, so the chip prioritizes yield and cost.
Jensen Huang Wants Zero Coding at NVIDIA — 'Purpose vs Task'
Jensen Huang wants zero coding by NVIDIA engineers, framing it as a task to minimize. The bet is AI-generated code will match human output for performance-critical software.
Google and Blackstone Launch TPU Venture, Challenging Nvidia Dominance
Google and Blackstone launched a TPU venture, financing AI infrastructure outside the hyperscale cloud model. Enterprise buyers get a standalone alternative to Nvidia-dominated GPU clusters.