cloud gpu
30 articles about cloud gpu in AI news
Gur Singh Claims 7 M4 MacBooks Match A100, Calls Cloud GPU Training a 'Scam'
Developer Gur Singh posted that seven M4 MacBooks (2.9 TFLOPS each) match an NVIDIA A100's performance, calling cloud GPU training a 'scam' and advocating for distributed, consumer-hardware approaches.
Cloud GPU vs. Colocation: H100 Costs $8k/Month on Google Cloud vs. $1k Colo
A technical founder highlights the stark economics: renting one H100 on Google Cloud costs ~$8,000/month, while the retail hardware is ~$30,000. At that rate, 4 months of cloud rental equals the cost of outright ownership, making colocation at ~$1k/month a compelling alternative for sustained AI workloads.
VS Code Now Connects Directly to Google Colab With Free T4 GPU
Google Colab integrates with VS Code, offering a free T4 GPU inside the editor, bypassing cloud GPU providers.
AirTrain Enables Distributed ML Training on MacBooks Over Wi-Fi
Developer @AlexanderCodes_ open-sourced AirTrain, a tool that enables distributed ML training across Apple Silicon MacBooks using Wi-Fi by syncing gradients every 500 steps instead of every step. This makes personal device training feasible for models up to 70B parameters without cloud GPU costs.
Cerebras Hits 981 Tokens/sec on 1T-Parameter Kimi K2.6, Claims 6.7× GPU Cloud Speedup
Cerebras reported 981 tokens/sec on the 1T-parameter Kimi K2.6 model, a 6.7× speedup over the next GPU cloud, validated by an independent third party.
IOWN Forum Pushes All-Photonic WAN for AI Neocloud Interconnects
The IOWN Global Forum is focusing its optical networking tech on datacenter interconnects, aiming to let GPU 'neoclouds' and financial firms use cheaper, remote facilities without latency penalties for AI workloads.
Mac Studio AI Hardware Shortage Signals Shift to Cloud Rentals
Developers report a global shortage of high-memory Apple Silicon Macs, with 128GB Mac Studios unavailable worldwide. This pushes practitioners toward renting cloud H100 GPUs at ~$3/hr, marking a shift from the recent local AI trend.
A Practical Guide to Fine-Tuning an LLM on RunPod H100 GPUs with QLoRA
The source is a technical tutorial on using QLoRA for parameter-efficient fine-tuning of an LLM, leveraging RunPod's cloud H100 GPUs. It focuses on the practical setup and execution steps for engineers.
Fine-Tuning an LLM on a 4GB GPU: A Practical Guide for Resource-Constrained Engineers
A Medium article provides a practical, constraint-driven guide for fine-tuning LLMs on a 4GB GPU, covering model selection, quantization, and parameter-efficient methods. This makes bespoke AI model development more accessible without high-end cloud infrastructure.
Open-source AI system running on $500 GPU reportedly outperforms Claude Sonnet
An open-source AI system running on consumer-grade $500 GPU hardware claims to outperform Anthropic's Claude Sonnet model while costing only $0.004 per task, eliminating cloud dependencies and API costs.
IREN Buys Nostrum Group, Adds 490MW Spanish Grid Power for AI Cloud
IREN acquired Nostrum Group, adding 490MW grid power in Spain for AI cloud expansion.
TensorWave Raises $350M Series B for AMD-Powered GPU Clusters
TensorWave raised $350M Series B for AMD-powered GPU clusters in North America, challenging Nvidia's dominance.
Apple Ditches Apple Silicon Pledge, Routes AI Queries to Google Cloud
Apple routes AI queries to Google Cloud, breaking 2024 Apple silicon pledge. Distilled Gemini runs locally; heavier queries use Nvidia tech in Google Cloud.
Nvidia Networking Revenue Hits $14.8B, Up 199% as AI Spending Shifts Beyond GPUs
Nvidia's Q1 FY2027 networking revenue surged 199% to $14.8B, signaling AI infrastructure spending is moving beyond GPUs into full-system networking. New reporting splits into Hyperscale and ACIE segments reflect a broadening customer base beyond hyperscalers.
vLLM Optimizations Cut Voice AI Latency by 40% on 6-GPU Cluster
vLLM optimizations on a 6-GPU cluster reduced voice AI latency by 40% for a Qwen-based system, enabling 500 concurrent sessions per node without hardware upgrades.
CoreWeave, Nebius Earnings Show AI Race Shifts From GPUs to Power
CoreWeave and Nebius Q1 earnings show AI infrastructure race shifting from GPU supply to power and scale, with combined capex guidance exceeding $55B.
MLX CUDA Backend Passes All Tests, Closing Apple GPU Gap
MLX CUDA backend passes all tests, enabling NVIDIA GPU support. Milestone bridges Apple Silicon and CUDA ecosystems for ML workloads.
NHN Deploys 7,656-GPU AI Cluster in Seoul
NHN launched a 7,656-GPU cluster in Seoul, South Korea, for domestic enterprise AI workloads. The cluster targets inference and training, competing with Naver and Kakao.
AMD Launches PCIe GPU for AI Workloads, Targets Existing Server Install Base
AMD launched a PCIe-based GPU for AI workloads, targeting existing servers. The card provides immediate boost without new data center buildouts.
Anthropic's 220K GPU Cluster: $5B Compute Bet Revealed
Anthropic reportedly has 220K NVIDIA GPUs and 310MW, implying a >$5B compute cluster, 3x OpenAI's largest.
Nscale to Deploy 66K+ Rubin GPUs for Microsoft in Portugal
Nscale will deploy 66,000+ NVIDIA Rubin GPUs for Microsoft at Portugal's Start Campus. The deal is a first for Rubin and signals Microsoft's geographic diversification.
Open-Weight 1T Model Inference Margins Hit 88% on Rented GPUs
Renting a 128 GPU cluster to serve a 1T open model yields ~88% margin on tokens sold at $0.002/1K, exposing a structural arbitrage over proprietary APIs.
Nebius Claims First NVIDIA GB300 Exemplar Cloud for Training
Nebius becomes first cloud provider validated as NVIDIA Exemplar Cloud on GB300 for training, targeting hyperscale AI workloads.
Moore Threads Q1 Revenue Up, Building 100K-GPU AI Cluster
Moore Threads reports Q1 2026 revenue growth and confirms progress building a 100,000-GPU cluster for AI training, signaling growing domestic AI infrastructure in China despite US export controls.
Oracle Nabs $16B for Michigan AI Data Center, Rivaling Google Cloud
Oracle has secured $16 billion in funding for a massive AI data center in rural Michigan, a move that pits it directly against Google Cloud and other hyperscalers in the race to build AI infrastructure.
SemiAnalysis: NVIDIA's Customer Data Drives Disaggregated Inference, LPU Surpasses GPU
SemiAnalysis states NVIDIA's direct customer feedback is leading the industry toward disaggregated inference architectures. In this model, specialized LPUs can outperform GPUs for specific pipeline tasks.
DARPA Leases 50 Nvidia H100 GPUs for Biological AI Program
DARPA's Biological Technologies Office is procuring 50 Nvidia HGX H100 GPU systems for its NODES program, with hardware delivery required within one month. This represents a significant government investment in AI infrastructure for biological research applications.
NVIDIA, Google Cloud Expand AI Partnership for Agentic & Physical AI
NVIDIA and Google Cloud announced an expanded partnership to advance agentic and physical AI, focusing on new infrastructure and software integrations. This builds on their existing collaboration to provide optimized AI training and inference platforms.
Cisco Reveals Scale-Across GPU Networking Needs 14x DCI Bandwidth
Cisco's chief architect detailed the massive bandwidth requirements for connecting AI clusters via 'scale-across' GPU networking, which needs 14x the capacity of traditional data center interconnects. This shift is creating a multi-billion dollar market for 800G coherent pluggables and deep-buffered switches.
AI Compute Crisis: GPU Prices Up 48%, Anthropic API at 98.95% Uptime
The AI industry faces a severe compute capacity crisis, with GPU prices up 48%, Anthropic API uptime falling to 98.95%, and OpenAI shutting down Sora to reallocate resources. Demand for agentic AI is outstripping supply, forcing rationing and product cancellations.