semianalysis
30 articles about semianalysis in AI news
Anthropic Opus 4.8 Cuts Bug-Finding Cost by 5x, SemiAnalysis Finds
Anthropic's Opus 4.8 + ultracode mode cuts severe bug-finding cost to ~1/5, per preliminary SemiAnalysis experiments with wide error bars.
SemiAnalysis Calls Jensen ComputeX Keynote 'F Tier' Over No AI DC News
SemiAnalysis rated Jensen Huang's ComputeX keynote 'F Tier' for no AI datacenter news and revealed a delayed NVIDIA ARM chip with broken video output.
SemiAnalysis: N3 chip demand far outstrips current consensus estimates
SemiAnalysis argues N3 chip demand far exceeds consensus accelerator models, implying a structural silicon shortage not priced by markets.
SemiAnalysis: Perplexity Slack Bot Beats Claude in Internal Trial
SemiAnalysis found Perplexity's Slack bot beats Claude in internal trial. 96% token budget goes to Anthropic, but usage may shift.
Cerebras Understates On-Chip SRAM by 8x, SemiAnalysis Notes
Cerebras understates on-chip SRAM by 8x per SemiAnalysis, a rare under-specification in chip marketing.
SemiAnalysis: NVIDIA's Customer Data Drives Disaggregated Inference, LPU Surpasses GPU
SemiAnalysis states NVIDIA's direct customer feedback is leading the industry toward disaggregated inference architectures. In this model, specialized LPUs can outperform GPUs for specific pipeline tasks.
Buffett Invests in Google After SemiAnalysis TPU Deep Dive
Berkshire Hathaway invested in Google in Q3 2025, after Buffett studied TPU v5p architecture. He compared it to railroads, citing 8,960 chips and 4.8 Tbps links.
ERCOT datacenter requests exceed grid capacity by 5x
ERCOT datacenter requests far exceed grid underwriting capacity, per @SemiAnalysis_, revealing grid approval as a binding constraint on AI infrastructure buildout.
Cerebras CS4 Stays on 5nm as SRAM Scaling Flattens
Cerebras CS4 stays on 5nm due to SRAM scaling flattening, per @SemiAnalysis_. 3nm offers no density gain, so the chip prioritizes yield and cost.
Median Coding Agent Hits 96k Input Tokens, Rewriting Inference Economics
SemiAnalysis found median coding agent uses 96k input tokens from 432k requests, shifting inference cost focus from output to context.
Vibe-Coding Bottleneck: CPU Box Rental Gets Harder
SemiAnalysis flags that vibe-coding wave makes cheap CPU box rentals less routine, bottlenecking developers who need quick cloud compute for AI prototyping.
Datacenter Developers Flee City Zoning for Unincorporated County Land
Datacenter developers are siting projects on unincorporated county land to avoid city zoning delays, redrawing the AI infrastructure map per @SemiAnalysis_.
NVIDIA Vera Rubin VR NVL72: Value Extraction Engine Arrives
NVIDIA's Vera Rubin VR NVL72 shifts from value vendor to value extractor, targeting TCO. SemiAnalysis argues this overturns prior pricing paradigm.
CPU Demand Flipping the AI Narrative as Datacenter Growth Shifts
A new analysis from SemiAnalysis indicates CPU demand is rising in AI datacenters, reversing a narrative of GPU-only dominance. This shift signals changing workload patterns and infrastructure priorities.
Huawei Hits 1.5µm Bond Pitch in Kirin 2026 Chips, Beats TSMC
Huawei's 2026 Kirin chips achieve 1.5µm hybrid bonding pitch, 16-36x denser than TSMC. Next year targets 1µm.
xAI Drops JAX, Builds Custom C Training Framework After <10% MFU
xAI dropped JAX for GPU training after <10% MFU, building a custom C framework with Grok Build. NVIDIA's JAX team loses its biggest customer.
Blackwell NVLink Breaks Confidential Compute, 61% Regression Reported
NVIDIA Blackwell confidential computing disables NVLink multicast, causing 61% regression on SGLang Qwen3.5 397B. Hopper had unencrypted NVLink, compounding the issue.
DeepSeek v4 Pricing Cuts 75%: $0.43/M Tokens In
DeepSeek v4 API pricing permanently cut 75% to $0.43/M input, $0.87/M output, enabled by 27% compute and 10% cache vs v3.2.
US 'Stop Stealing our Chips Act' Would Pay Whistleblowers 10-30% of Export Fines
Proposed US law would pay whistleblowers 10-30% of export-control fines, targeting AI chip smuggling to China through intermediaries like Malaysian resellers.
Google TPU 'Broadfly' Topology Scales Pod to 1,152 Chips
Google unveiled a Broadfly TPU topology at Cloud Next, scaling pods to 1,152 chips — 4.5x larger than Ironwood — with max 7 hops. This inference-first design challenges NVIDIA's NVLink on scale and latency.
Cerebras IPO Challenges GPU Scaling Orthodoxy
Cerebras filed for IPO on April 21, betting wafer-scale chips can disrupt Nvidia's GPU cluster model for AI workloads.
Cerebra's Tokenomics Bet: AWS, OpenAI Deals and Wafer-Scale Edge
Cerebra's tokenomics pricing and AWS/OpenAI partnerships challenge NVIDIA's inference dominance, offering a 5x cost reduction per token via its wafer-scale architecture.
AMD Gives OSS Maintainers $3.6M MI355X Cluster Access
AMD gives vLLM/SGLang maintainers $3.6M MI355X cluster access, ending NVIDIA's monopoly on OSS inference hardware access.
B200 PD Disaggregation Boosts Token Throughput 7x, Slashes Cost
B200 clusters with PD disaggregation over RoCEv2 Ethernet achieve 7x token throughput, cutting cost per million tokens 7x.
AMD ROCm Performance Jumps 75x in 14 Days Post-DeepSeek v4
AMD ROCm stack improved 75x in 14 days post-DeepSeek v4 via fused operations. Still needs 5x more to match B200 performance.
Micron's PSMC Fab Buy: A $3.8B Memory Bet
Micron acquires PSMC's P5 fab for $3.8B, converting it for HBM and DDR5 production. The deal cuts 12-18 months off time-to-volume, challenging Samsung's HBM lead.
NVIDIA Feynman GPU Power Semi Content Hits $191K, 17× Blackwell
NVIDIA Feynman GPUs require $191K in power semiconductors per system, 17× Blackwell, driven by 800V DC architecture shift.
ODMs Evolve from Manufacturers to AI Infrastructure Partners
ODMs shift from manufacturing to design/integration partners for AI racks, driven by GPU/ASIC complexity and liquid cooling.
Intel's UCIe-S Hits 48 Gb/s on 22nm, Beats 3nm EMIB
Intel demonstrated a UCIe-S die-to-die interconnect on 22nm hitting 48 Gb/s/lane over standard organic substrate, beating a 3nm EMIB design with 3× higher data rate and 2.8× higher bandwidth density. This signals a strategic shift away from EMIB for Intel's own products toward UCIe over substrate.
Nvidia B200 Costs $6,400 to Produce, Gross Margin Hits 82%
Epoch AI estimates Nvidia's B200 GPU costs $5,700–$7,300 to produce, with HBM memory and advanced packaging accounting for two-thirds of the cost. At a $30k–$40k sale price, chip-level gross margins reach ~82%, though rack-scale margins may be lower.