cuda
30 articles about cuda in AI news
NanoEuler: GPT-2-Scale 116M Model Built in Pure C/CUDA From Scratch
NanoEuler is a 116M-parameter GPT-2-scale model built in pure C/CUDA from scratch. It provides a complete educational training pipeline for understanding LLMs at the lowest level.
MLX CUDA Backend Passes All Tests, Closing Apple GPU Gap
MLX CUDA backend passes all tests, enabling NVIDIA GPU support. Milestone bridges Apple Silicon and CUDA ecosystems for ML workloads.
OpenAI Codex Now Translates C++, CUDA, and Python to Swift and Python for CoreML Model Conversion
OpenAI's Codex AI code generator is now being used to automatically rewrite C++, CUDA, and Python code into Swift and Python specifically for CoreML model conversion, a previously manual and error-prone process for Apple ecosystem deployment.
ByteDance's CUDA Agent: The AI System Outperforming Human Experts in GPU Code Generation
ByteDance has unveiled CUDA Agent, a large-scale reinforcement learning system that generates high-performance CUDA kernels. The system achieves state-of-the-art results, outperforming torch.compile by up to 100% and beating leading AI models like Claude Opus 4.5 and Gemini 3 Pro by approximately 40% on the most challenging tasks.
WSL 3 Preview: Cut Claude Code's Local Inference Latency on Windows
WSL 3 preview delivers near-native GPU/NPU for Claude Code + Ollama on Copilot+ laptops, but WSL 2 still handles NVIDIA CUDA fine for desktop users.
LlamaFactory Enables No-Code Fine-Tuning for 100+ LLMs Including Llama 4, Qwen, and DeepSeek
The LlamaFactory project eliminates traditional fine-tuning complexity with a drag-and-click interface, supporting over 100 models. This reduces setup from hours of boilerplate code and CUDA debugging to a visual workflow.
Nvidia's Open-Source Gambit: NeMoClaw Aims to Tame Enterprise AI Agents
Nvidia is preparing to launch NeMoClaw, an open-source platform designed for building secure, autonomous AI agents for enterprise workflows. Breaking from its proprietary CUDA tradition, the move targets software ecosystem dominance regardless of hardware.
Jim Keller: Tenstorrent IPO Looms as BlackHole Chip Scales
Jim Keller confirmed Tenstorrent's IPO plans as BlackHole chip scales for AI inference, competing with Nvidia. No revenue disclosed.
OpenAI-Broadcom Chip Hints at Token Price Collapse
OpenAI and Broadcom are co-developing a custom AI inference chip that could cut token prices by an order of magnitude, per @mweinbach. The chip targets inference workloads, not training, and aims to reduce dependency on Nvidia.
NVIDIA Vera Rubin: One Rack Matches TOP500, 35 EU Labs Deploy
NVIDIA's Vera Rubin NVL72 delivers TOP500-class performance in a single rack, with 35 European labs deploying the system for AI and HPC.
How Simon Willison Ported a 0.2B Image Model to the Browser with Claude
Simon Willison used Claude Code to port a 0.2B image inpainting model to WebGPU, running it as a parallel side project while his main agent worked on Datasette. The technique? Research with Claude.ai, then hand off to Claude Code with research.md.
Qualcomm in Talks to Acquire Modular for $4B, Landing Lattner
Qualcomm nears $4B acquisition of Modular, Chris Lattner's AI infra startup. Deal targets inference software for edge and data center AI chips.
NVFP4 GEMM on RTX Pro Blackwell: SM12x Breaks from B200 Programming Model
NVIDIA's SM12x architecture drops tcgen05.mma for mma.sync, breaking B200 kernel compatibility. SM8x kernels port easily; developers must maintain separate codebases.
Intel Targets Nvidia, AMD with New AI Chip Launch by End 2026
Intel plans to launch a new AI data center chip by end of 2026, targeting Nvidia and AMD in the AI infrastructure market.
AWS Beats Cloud Rivals to NVIDIA Blackwell with EC2 G7 — 4.6x AI Inference Gain Over G6
AWS launched EC2 G7 instances on June 19, 2026, becoming the first major cloud to offer NVIDIA RTX PRO 4500 Blackwell Server Edition GPUs. The instances claim 4.6x AI inference performance over G6, backed by 700 Gbps EFA networking and 32 GB GDDR7 per GPU. The move arrives the same week AWS confirme
NVIDIA, GENCI Launch AI Factory France Compute Access for Startups
NVIDIA and GENCI launched AI Factory France at VivaTech, giving European startups free access to AI supercomputers. The program includes compute, tools, and expert support for NVIDIA Inception members.
Tensordyne Claims 10x Efficiency Gain with Napier Architecture
Tensordyne claims 10x efficiency over Nvidia in inference with Napier gen, but lacks data or verification.
AMD's Lemonade v10.8 Adds MCP Support, Letting Claude Desktop and Cursor Route Tasks to Local AMD GPUs
AMD-backed Lemonade v10.8, released June 17, now exposes a Model Context Protocol server, letting Claude Desktop, Cursor, and GitHub Copilot route inference tasks to local AMD Ryzen AI NPUs, Radeon GPUs, or plain CPUs — no cloud API required. The update also adds Moonshine speech-to-text, expanded R
Qualcomm Launches AI Data Center Program With Hyperscaler Customer
Qualcomm launched an AI data center program with a major hyperscaler customer, targeting inference workloads. Financial terms and partner identity undisclosed.
Intel Omni-Path Resurfaces as InfiniBand Rival for DoE Supercomputers
Intel's Omni-Path interconnect, revived by Cornelis Networks, will connect DoE supercomputers at 400Gbps as an InfiniBand alternative.
Cerebras Claims Performance Parity With Nvidia H100 on AI Training
Cerebras claims wafer-scale chips match Nvidia H100 on AI training performance per watt, challenging Nvidia's dominance.
NVIDIA Blackwell Ultra Leads First Agentic AI Benchmark, 20x Agents/MW vs Hopper
NVIDIA Blackwell Ultra NVL72 leads the first AgentPerf benchmark for agentic AI, delivering 20x more agents per megawatt than Hopper.
TensorWave Raises $350M Series B for AMD-Powered GPU Clusters
TensorWave raised $350M Series B for AMD-powered GPU clusters in North America, challenging Nvidia's dominance.
Nvidia Buys Kumo AI for $400M to Predict from Business Data
Nvidia acquired Kumo AI for $400M+ to bring foundation model predictions to enterprise relational data, filling a gap left by LLMs.
Foxconn and Intel Partner on AI Data Center Rack Systems
Foxconn and Intel partner on AI rack systems, integrating Intel components into Foxconn manufacturing for hyperscale customers. No financial terms disclosed.
Nvidia, Unitree, Sharpa unveil H2+ humanoid robot reference design
Nvidia, Unitree, and Sharpa released H2+, a humanoid robot reference design, at Computex 2026 to standardize physical AI development workflows.
Nvidia Unveils New Windows SoC, Targeting AI PCs
Nvidia announced a Windows SoC for AI PCs, per @mweinbach. Chip targets on-device inference, competing with Qualcomm and Intel.
NVIDIA Nemotron 3 Ultra: 550B Open-Weight Model Challenges GLM, Kimi
NVIDIA released Nemotron 3 Ultra, a 550B open-weight model claiming near-SOTA performance, competing with GLM-5.1 and Kimi K2.6. No benchmarks yet.
xAI Drops JAX, Builds Custom C Training Framework After <10% MFU
xAI dropped JAX for GPU training after <10% MFU, building a custom C framework with Grok Build. NVIDIA's JAX team loses its biggest customer.
Jensen Huang Wants Zero Coding at NVIDIA — 'Purpose vs Task'
Jensen Huang wants zero coding by NVIDIA engineers, framing it as a task to minimize. The bet is AI-generated code will match human output for performance-critical software.