Skip to content
gentic.news — AI News Intelligence Platform
Connecting to the Living Graph…

blackwell

30 articles about blackwell in AI news

NVIDIA NVFP4 on Blackwell Cuts JAX Training by 1.8x in MaxText

NVIDIA NVFP4 on Blackwell achieves 1.8x training speedup over FP8 in JAX/MaxText with no claimed accuracy loss for models up to 70B, but larger-scale validation is needed.

85% relevant

Blackwell NVLink Breaks Confidential Compute, 61% Regression Reported

NVIDIA Blackwell confidential computing disables NVLink multicast, causing 61% regression on SGLang Qwen3.5 397B. Hopper had unencrypted NVLink, compounding the issue.

100% relevant

NVIDIA Vera Rubin NVL72 Cuts Agentic AI Cost 10x vs Blackwell

NVIDIA Vera Rubin NVL72 cuts agentic AI inference cost 10x vs Blackwell, per Huang at Dell event. 5,000 enterprises already on Dell factories.

95% relevant

Perplexity Claims 3x Blackwell Inference Throughput for 70B Models

Perplexity AI claims 3x inference throughput for 70B models on Nvidia Blackwell GPUs via FP4 and custom scheduling. The gain exceeds Nvidia's own 2x marketing claim.

85% relevant

Nvidia Blackwell CLC Boosts GEMM Tile Scheduling by 15% Over Static Persistence

Nvidia Blackwell CLC delivers up to 15% higher GEMM throughput via dynamic persistent tile scheduling, fixing load imbalance without startup overhead.

95% relevant

Unsloth × NVIDIA Cut LLM Fine-Tuning ~25% — Three Glue-Code Wins on Blackwell

Daniel & Michael Han at Unsloth, in collaboration with NVIDIA, published a joint guide quantifying three glue-code optimizations that combine for ~25% faster LLM training on B200 Blackwell hardware. The wins target overhead around the main kernels — caching packed-sequence metadata, double-buffered gradient checkpoint reloads, and a cheaper GPT-OSS MoE router using argsort + bincount. All three are merged via public PRs.

87% relevant

NVIDIA Open-Sources MRC, the RDMA Protocol Powering OpenAI's Blackwell Clusters

NVIDIA open-sourced MRC, a multi-path RDMA protocol used by OpenAI on Blackwell clusters, enabling microsecond rerouting across 64 paths.

100% relevant

NVIDIA Feynman GPU Power Semi Content Hits $191K, 17× Blackwell

NVIDIA Feynman GPUs require $191K in power semiconductors per system, 17× Blackwell, driven by 800V DC architecture shift.

95% relevant

Pyptx: Write Nvidia PTX Kernels in Python for Hopper and Blackwell

Pyptx lets developers write and launch hand-tuned Nvidia PTX kernels directly from Python, supporting Hopper (sm_90a) and Blackwell (sm_100a). It provides explicit control over registers, shared memory, and advanced features like WGMMA and TMA, with dispatch through JAX, PyTorch eager, and torch.compile.

91% relevant

Cursor AI Claims 1.84x Faster MoE Inference on NVIDIA Blackwell GPUs

Cursor AI announced a rebuilt inference engine for Mixture-of-Experts models on NVIDIA's new Blackwell GPUs, resulting in a claimed 1.84x speedup and improved output accuracy.

85% relevant

Nvidia Claims MLPerf Inference v6.0 Records with 288-GPU Blackwell Ultra Systems, Highlights 2.7x Software Gains

MLCommons released MLPerf Inference v6.0 results, introducing multimodal and video model tests. Nvidia set records using 288-GPU Blackwell Ultra systems and achieved a 2.7x performance jump on DeepSeek-R1 via software optimizations alone.

95% relevant

DeepSeek's Blackwell Training Exposes Critical Gaps in US Chip Export Controls

Chinese AI startup DeepSeek reportedly trained its latest model on Nvidia's restricted Blackwell chips, challenging US export controls. The development reveals significant loopholes in semiconductor restrictions amid escalating AI competition.

90% relevant

DeepSeek's Blackwell Gambit: How a Chinese AI Firm Reportedly Circumvented U.S. Chip Export Controls

Chinese AI company DeepSeek reportedly trained its upcoming model using Nvidia's restricted Blackwell chips, potentially clustered in an Inner Mongolia data center. This development highlights the escalating tech rivalry and challenges of enforcing export controls in the AI arms race.

95% relevant

AI Power Shift: How DeepSeek's Alleged Blackwell Chip Access Could Reshape Global AI Race

Chinese AI startup DeepSeek reportedly trained its next major model on Nvidia's banned Blackwell chips, potentially triggering a seismic shift in the AI landscape. US giants Google, OpenAI, and Anthropic are preparing for what could be a market-disrupting release next week.

80% relevant

NVIDIA's Blackwell Ultra Shatters Efficiency Records: 50x Performance Per Watt Leap Redefines AI Economics

NVIDIA's new Blackwell Ultra GB300 NVL72 systems promise a staggering 50x improvement in performance per megawatt and 35x lower cost per token compared to previous Hopper architecture, addressing the critical energy bottleneck in AI scaling.

95% relevant

Anthropic Leases xAI's Colossus 1 After Mixed-Architecture Flaw Blocked

Anthropic leased xAI's 220K-GPU Colossus 1 after its mixed architecture failed to train Grok. Musk builds Blackwell-only Colossus 2 for training and IPO.

100% relevant

NVIDIA, DOE Build 100K-GPU Supercomputer for Science

DOE and NVIDIA announced Solstice, a 100K-GPU Vera Rubin supercomputer delivering 5,000 exaflops, and Equinox with 10K Blackwell GPUs.

80% relevant

We Hosted a 35B LLM on an NVIDIA DGX Spark — A Technical Post-Mortem

A detailed, practical guide to deploying the Qwen3.5–35B model on NVIDIA's GB10 Blackwell hardware. The article serves as a crucial case study on the real-world challenges and solutions for on-premise LLM inference.

95% relevant

Lilly's AI Factory: How a 9,000+ GPU SuperPOD is Rewriting Pharmaceutical Discovery

Eli Lilly has launched 'LillyPod,' the world's most powerful privately-owned AI factory for drug discovery. Powered by NVIDIA's new DGX B300 systems with over 1,000 Blackwell Ultra GPUs, it promises to accelerate medical breakthroughs at unprecedented scale.

80% relevant

Meta's $135 Billion AI Bet: How Confidential Computing Will Transform WhatsApp

Meta commits to buying millions of NVIDIA Blackwell and Rubin GPUs in a landmark partnership, deploying confidential computing technology to bring AI to WhatsApp while protecting user privacy. This represents a major shift in how AI will be integrated into secure messaging platforms.

82% relevant

Meta's Multi-Million GPU Gamble: How a Chip Deal Redefines AI's Future

Meta has signed a massive, multi-year pact with Nvidia to deploy millions of next-generation Blackwell and Rubin GPUs across its data centers. This unprecedented hardware commitment signals a new phase in the AI arms race, where computational scale becomes the primary competitive moat.

85% relevant

Yotta Data Services Seeks $4B Valuation in Pre-IPO Round, Expands India's Largest Nvidia GPU Cluster

Indian data center operator Yotta is raising $500-600M at a ~$4B valuation ahead of an IPO. The firm is scaling its Nvidia H100 and Blackwell (B200/B300) GPU fleet to position itself as a domestic AI infrastructure alternative.

79% relevant

Nvidia, Unitree, Sharpa unveil H2+ humanoid robot reference design

Nvidia, Unitree, and Sharpa released H2+, a humanoid robot reference design, at Computex 2026 to standardize physical AI development workflows.

90% relevant

SemiAnalysis Calls Jensen ComputeX Keynote 'F Tier' Over No AI DC News

SemiAnalysis rated Jensen Huang's ComputeX keynote 'F Tier' for no AI datacenter news and revealed a delayed NVIDIA ARM chip with broken video output.

82% relevant

Nvidia Unveils New Windows SoC, Targeting AI PCs

Nvidia announced a Windows SoC for AI PCs, per @mweinbach. Chip targets on-device inference, competing with Qualcomm and Intel.

100% relevant

Dell Ships First Nvidia Vera Rubin NVL72 Rack to CoreWeave

Dell delivered the first Nvidia Vera Rubin NVL72 rack to CoreWeave. Each rack packs 72 Rubin GPUs, 36 Vera CPUs, 3.6 exaFLOPS FP4 inference, 75 TB memory, and 260 TB/s NVLink bandwidth.

100% relevant

SemiAnalysis: N3 chip demand far outstrips current consensus estimates

SemiAnalysis argues N3 chip demand far exceeds consensus accelerator models, implying a structural silicon shortage not priced by markets.

89% relevant

Cerebras CS4 Stays on 5nm as SRAM Scaling Flattens

Cerebras CS4 stays on 5nm due to SRAM scaling flattening, per @SemiAnalysis_. 3nm offers no density gain, so the chip prioritizes yield and cost.

85% relevant

Jensen Huang Wants Zero Coding at NVIDIA — 'Purpose vs Task'

Jensen Huang wants zero coding by NVIDIA engineers, framing it as a task to minimize. The bet is AI-generated code will match human output for performance-critical software.

77% relevant

Google and Blackstone Launch TPU Venture, Challenging Nvidia Dominance

Google and Blackstone launched a TPU venture, financing AI infrastructure outside the hyperscale cloud model. Enterprise buyers get a standalone alternative to Nvidia-dominated GPU clusters.

85% relevant