gpu programming

30 articles about gpu programming in AI news

Karpathy's 'Autoresearch' Tool Democratizes AI Research: One GPU, One Night, 100 Experiments

Andrej Karpathy has open-sourced 'autoresearch,' a tool that enables AI to autonomously improve its own training code. By writing simple prompts in Markdown, researchers can have AI agents run hundreds of experiments overnight on a single GPU, dramatically accelerating the research process.

Mar 8, 202695% relevant

NVFP4 GEMM on RTX Pro Blackwell: SM12x Breaks from B200 Programming Model

NVIDIA's SM12x architecture drops tcgen05.mma for mma.sync, breaking B200 kernel compatibility. SM8x kernels port easily; developers must maintain separate codebases.

Jun 21, 202686% relevant

Karpathy's Autonomous AI Researcher: Programming the Programmer in the Age of Agentic Science

Andrej Karpathy has open-sourced an autonomous AI research agent that can run ~100 experiments overnight without human supervision. The system turns research into a game with fixed-time trials, where prompt engineering replaces manual coding.

Mar 7, 202695% relevant

Jensen Huang Declares AI Has Democratized Programming Through 'Vibe Coding'

NVIDIA CEO Jensen Huang claims AI has eliminated the technology divide, enabling anyone to become a software programmer through 'vibe coding.' He cites examples of individuals creating million-dollar businesses using these new AI-powered development tools.

Mar 5, 202685% relevant

ByteDance's CUDA Agent: The AI System Outperforming Human Experts in GPU Code Generation

ByteDance has unveiled CUDA Agent, a large-scale reinforcement learning system that generates high-performance CUDA kernels. The system achieves state-of-the-art results, outperforming torch.compile by up to 100% and beating leading AI models like Claude Opus 4.5 and Gemini 3 Pro by approximately 40% on the most challenging tasks.

Mar 2, 202695% relevant

NVIDIA's cuQuantum-DGX OS Aims to Manage Hybrid Quantum-Classical Workflows

NVIDIA announced its AI software stack is evolving into an operating system for quantum computing, aiming to manage the complex workflow between quantum processors and classical GPUs. This targets a major integration bottleneck as quantum hardware scales.

Apr 14, 202685% relevant

InCoder-32B-Thinking Hits 81.3% on LiveCodeBench, Trained on Chip & Kernel Traces

InCoder-32B-Thinking, a 32B parameter model trained on execution traces from chip design, GPU kernels, and embedded systems, scores 81.3% on LiveCodeBench V5 and an 84% compile pass rate on CAD-Coder.

Apr 11, 202692% relevant

Karpathy's Autoresearch: Democratizing AI Experimentation with Minimalist Agentic Tools

Andrej Karpathy releases 'autoresearch,' a 630-line Python tool enabling AI agents to autonomously conduct machine learning experiments on single GPUs. This minimalist framework transforms how researchers approach iterative ML optimization.

Mar 9, 202685% relevant

Google's TensorFlow 2.21 Revolutionizes Edge AI with Unified LiteRT Framework

Google has launched TensorFlow 2.21, marking LiteRT's transition to a production-ready universal on-device inference framework. This major update delivers faster GPU performance, new NPU acceleration, and seamless PyTorch edge deployment, effectively replacing TensorFlow Lite for mobile and edge applications.

Mar 7, 202675% relevant

NVIDIA's SVG Benchmark Saturation Signals New Era in AI Graphics Performance

NVIDIA CEO Jensen Huang's presentation of the next RTX 6000 GPU series reveals that SVG benchmark performance has reached saturation, indicating a major milestone in AI-accelerated graphics rendering capabilities.

Feb 26, 202685% relevant

Cerebras' Strategic Partnership Yields Breakthrough AI Training Results

Cerebras Systems' partnership with Abu Dhabi's G42 has produced remarkable AI training benchmarks, achieving results 100x faster than traditional GPU clusters. The collaboration demonstrates the viability of wafer-scale computing for large language model development.

Feb 20, 202685% relevant

Rimnot, Yuantu Deploy 10K Robots in Server Factories by 2027

Rimnot partners Yuantu for 10K robots in server factories by 2027, achieving 30-minute adaptation at WAIC.

Jul 22, 202682% relevant

Soofi S 30B-A3B: German open model tops English, German benchmarks

German consortium releases Soofi S 30B-A3B, an open MoE model beating OLMo 3 and Apertus 70B on English and German benchmarks while activating only 3.2B of 31.6B parameters.

Jul 13, 2026100% relevant

Jensen Huang Wants Zero Coding at NVIDIA — 'Purpose vs Task'

Jensen Huang wants zero coding by NVIDIA engineers, framing it as a task to minimize. The bet is AI-generated code will match human output for performance-critical software.

May 24, 202677% relevant

Ollama Now Runs Codex Locally: DeepSeek V4, Gemma 4, Qwen 3.6 Supported

Ollama integrates Codex support for DeepSeek V4, Gemma 4, Qwen 3.6, enabling free local code generation, challenging OpenAI's API model.

May 15, 202683% relevant

Pyptx: Write Nvidia PTX Kernels in Python for Hopper and Blackwell

Pyptx lets developers write and launch hand-tuned Nvidia PTX kernels directly from Python, supporting Hopper (sm_90a) and Blackwell (sm_100a). It provides explicit control over registers, shared memory, and advanced features like WGMMA and TMA, with dispatch through JAX, PyTorch eager, and torch.compile.

Apr 26, 202691% relevant

Use Claude Code to Automate Systematic Literature Reviews

Claude Code can automate systematic literature reviews: scrape papers, extract key themes, and generate structured summaries — all from the terminal.

Apr 26, 2026100% relevant

Nvidia Invests $2B in Marvell for NVLink Fusion Interconnect

Nvidia is investing $2 billion in Marvell Technology to deepen their partnership on NVLink Fusion, a new interconnect architecture for scaling AI clusters beyond current limits.

Apr 26, 2026100% relevant

Developer Achieves 395x RTFx on M5 Max with Fastest Parakeet v3 for Apple ANE

Developer @mweinbach has optimized the Parakeet v3 speech recognition model for Apple's Neural Engine, achieving a 395x real-time factor on an M5 Max chip. This represents a significant performance leap for on-device AI inference on Apple Silicon.

Apr 22, 202687% relevant

AI-Powered PS4 Emulator 'Spine' Runs Bloodborne Locally on PC

A developer has released Spine, a PS4 emulator that uses AI techniques to run Bloodborne fully on PC. This represents a major step forward in console emulation, previously considered years away.

Apr 20, 202687% relevant

Claude Code Builds Browser-Based 3D Flight Simulator in Weekend

A developer used Anthropic's Claude Code to build a complete 3D flight simulator that runs in a web browser over a weekend, demonstrating rapid AI-assisted game development.

Apr 18, 202685% relevant

Claude Code Runs 100% Locally on Mac via Native 200-Line API Server

A developer created a 200-line server that speaks Anthropic's API natively, allowing Claude Code to run entirely locally on M-series Macs at 65 tokens/second with no cloud dependency.

Apr 18, 2026100% relevant

MLX-Benchmark Suite Launches as First Comprehensive LLM Eval for Apple Silicon

The MLX-Benchmark Suite has been released as the first comprehensive evaluation framework for Large Language Models running on Apple's MLX framework. It provides standardized metrics for models optimized for Apple Silicon hardware.

Apr 18, 202685% relevant

Qwen 3.6 Released: Free, Open-Weights Model for Local AI Coding

Alibaba's Qwen team released Qwen 3.6, an open-weights AI model for local deployment. This provides a free, private alternative to ID-verified models like Anthropic's Mythos and OpenAI's Codex.

Apr 17, 2026100% relevant

NVIDIA Ising AI OS Cuts Quantum Calibration from Days to Hours

NVIDIA launched Ising, an open-source AI model family that acts as an OS for quantum computers. It uses a vision language model to automate calibration and a 3D neural network for error correction, reducing calibration from days to hours.

Apr 14, 202695% relevant

Claude Code's 'Shallow Thinking' Problem

Enterprise users report Claude Code sometimes skips deep analysis on complex tasks. Use specific prompting techniques and session management to ensure thorough reasoning.

Apr 13, 202687% relevant

NVIDIA CEO Jensen Huang Declares All Future Software Will Be Agentic

NVIDIA CEO Jensen Huang stated that all future software will be agentic, meaning every software company must transform into an agentic company. This vision positions AI agents as the fundamental architecture for future computing.

Apr 4, 202687% relevant

OpenAI Codex Now Translates C++, CUDA, and Python to Swift and Python for CoreML Model Conversion

OpenAI's Codex AI code generator is now being used to automatically rewrite C++, CUDA, and Python code into Swift and Python specifically for CoreML model conversion, a previously manual and error-prone process for Apple ecosystem deployment.

Apr 3, 202689% relevant

AI Engineer Henry Ndubuaku Releases Open-Source 'Maths, CS & AI Compendium' Textbook

AI engineer Henry Ndubuaku has published a free, open-source textbook compiling mathematics, computer science, and AI concepts. The resource emphasizes intuitive understanding over notation and has reportedly helped users land roles at DeepMind, OpenAI, and Nvidia.

Mar 27, 202685% relevant

Open-Source 'Manus Alternative' Emerges: Fully Local AI Agent with Web Browsing, Code Execution, and Voice Input

An open-source project has been released that replicates core features of AI agent platforms like Manus—autonomous web browsing, multi-language code execution, and voice input—while running entirely locally on user hardware with no external API dependencies.

Mar 26, 202685% relevant

Explore More

AI Agents Large Language Models Claude Code OpenAI RAG MCP Fine-tuning Benchmarks Open Source AI AI Safety