rocm
14 articles about rocm in AI news
AMD ROCm Performance Jumps 75x in 14 Days Post-DeepSeek v4
AMD ROCm stack improved 75x in 14 days post-DeepSeek v4 via fused operations. Still needs 5x more to match B200 performance.
Intel Targets Nvidia, AMD with New AI Chip Launch by End 2026
Intel plans to launch a new AI data center chip by end of 2026, targeting Nvidia and AMD in the AI infrastructure market.
AMD's Lemonade v10.8 Adds MCP Support, Letting Claude Desktop and Cursor Route Tasks to Local AMD GPUs
AMD-backed Lemonade v10.8, released June 17, now exposes a Model Context Protocol server, letting Claude Desktop, Cursor, and GitHub Copilot route inference tasks to local AMD Ryzen AI NPUs, Radeon GPUs, or plain CPUs — no cloud API required. The update also adds Moonshine speech-to-text, expanded R
OpenAI DeploymentSim predicts GPT-5 errors 92% of the time pre-launch
OpenAI's Deployment Simulation predicted GPT-5 errors with 92% accuracy using 1.3M real conversations, outperforming standard safety tests.
TensorWave Raises $350M Series B for AMD-Powered GPU Clusters
TensorWave raised $350M Series B for AMD-powered GPU clusters in North America, challenging Nvidia's dominance.
vLLM Optimizations Cut Voice AI Latency by 40% on 6-GPU Cluster
vLLM optimizations on a 6-GPU cluster reduced voice AI latency by 40% for a Qwen-based system, enabling 500 concurrent sessions per node without hardware upgrades.
AMD Gives OSS Maintainers $3.6M MI355X Cluster Access
AMD gives vLLM/SGLang maintainers $3.6M MI355X cluster access, ending NVIDIA's monopoly on OSS inference hardware access.
AMD Launches PCIe GPU for AI Workloads, Targets Existing Server Install Base
AMD launched a PCIe-based GPU for AI workloads, targeting existing servers. The card provides immediate boost without new data center buildouts.
AMD MI350P PCIe Card Claims 40% FP8 Lead Over Nvidia H200 NVL
AMD launched MI350P PCIe AI card with 144GB HBM3E, claiming 39% FP8 lead over Nvidia H200 NVL. Targets drop-in air-cooled server upgrades.
AMD Backs UALink Open Interconnect to Challenge NVIDIA NVLink in AI
AMD is supporting the newly formed UALink Consortium, which aims to create an open standard for connecting AI accelerators. This move challenges NVIDIA's control over the critical NVLink technology that underpins its AI data center systems.
Hugging Face Launches 'Kernels' Hub for GPU Code, Like GitHub for AI Hardware
Hugging Face has launched 'Kernels,' a new section on its Hub for sharing and discovering optimized GPU kernels. This treats performance-critical code as a first-class artifact, similar to AI models.
Developer Ranks NPU Model Compilation Ease: Apple 1st, AMD Last
Developer @mweinbach ranked the ease of using AI coding agents to compile ML models for NPUs. Apple's ecosystem was rated easiest, while AMD's tooling was ranked most difficult.
Ollama Now Supports Apple MLX Backend for Local LLM Inference on macOS
Ollama, the popular framework for running large language models locally, has added support for Apple's MLX framework as a backend. This enables more efficient execution of models like Llama 3.2 and Mistral on Apple Silicon Macs.
98× Faster LLM Routing Without a Dedicated GPU: Technical Breakthrough for vLLM Semantic Router
New research presents a three-stage optimization pipeline for the vLLM Semantic Router, achieving 98× speedup and enabling long-context classification on shared GPUs. This solves critical memory and latency bottlenecks for system-level LLM routing.