epoch ai

27 articles about epoch ai in AI news

Epoch AI's CursorBench Benchmarks AI Code Editing at Scale

Epoch AI launched CursorBench, a 500-task benchmark for AI code editors. It reveals a 15% accuracy gap vs. humans and 3x latency variance.

Jun 27, 202693% relevant

SciCode: Epoch AI Launches Benchmark Measuring AI Research Ability

Epoch AI launched SciCode benchmark testing LLMs on real research coding tasks. Top models score below 30%, exposing gap between coding benchmarks and scientific ability.

Jun 27, 202695% relevant

Epoch AI: Hormuz LNG Shock Absorbed by Chip Margins, Gulf Investment is AI Risk

A new analysis from Epoch AI Research finds the Strait of Hormuz conflict's energy shock is manageable for AI infrastructure, but the real threat is the potential drying up of Gulf capital investment, crucial for projects like Stargate UAE.

Apr 10, 202685% relevant

AI Data Center Scale Doubles Every 7 Months, Epoch Finds

Epoch AI finds AI data center scale doubles every 7 months, driven by Google, Microsoft, and Amazon investments. This accelerates beyond the earlier 12-month cycle, raising training cost projections to $10 billion by 2028.

Jun 25, 202695% relevant

OSWorld 2.0 Launches, Tests AI Agents on 1,500 Desktop Tasks

Epoch AI released OSWorld 2.0 with 1,500 desktop tasks, up from 369 in v1, testing AI agents on adversarial and cross-application workflows.

Jun 27, 202677% relevant

Nvidia B200 Costs $6,400 to Produce, Gross Margin Hits 82%

Epoch AI estimates Nvidia's B200 GPU costs $5,700–$7,300 to produce, with HBM memory and advanced packaging accounting for two-thirds of the cost. At a $30k–$40k sale price, chip-level gross margins reach ~82%, though rack-scale margins may be lower.

Apr 24, 2026100% relevant

Open-Weight Models Trail Frontier AI by Four Months: EpochAI

EpochAI finds open-weight models trail frontier closed-source models by four months, a small gap reflecting rapid catch-up.

May 29, 202679% relevant

GPT-5.5 Pro Leapfrogs on Epoch Benchmark; Base Model Beats Prior Pro

A tweet from @kimmonismus reveals GPT-5.5 Pro shows significant Epoch benchmark gains, and the non-Pro GPT-5.5 surpasses GPT-5.4 Pro, suggesting major efficiency improvements at OpenAI.

Apr 29, 202699% relevant

WiT: Waypoint Diffusion Transformers Achieve FID 2.09 on ImageNet 256×256 in 265 Epochs, Matching JiT-L/16 Efficiency

Researchers introduced WiT, a diffusion transformer that uses semantic waypoints from pretrained vision models to resolve trajectory conflicts in pixel-space flow matching. It matches the performance of JiT-L/16 at 600 epochs in just 265 epochs, achieving an FID of 2.09 on ImageNet 256×256.

Mar 22, 202685% relevant

Colossus 2: xAI's Memphis Cluster Hits 300,000 GPUs

xAI's Colossus 2 hits 300,000 GPUs, targeting 1M by year-end. Training Grok-3, the $6B cluster challenges OpenAI and Google.

Jun 24, 202698% relevant

SemiAnalysis: Pretraining Dead for All but Frontier Labs

@SemiAnalysis_ declares pretraining dead for non-frontier labs, citing 'Pretrainitis' as vanity-driven waste. Prompt engineering offers higher ROI.

Jun 11, 202685% relevant

Time's First AI A-List: Alibaba, ByteDance, Zhipu AI Make Cut

Time magazine named Alibaba, ByteDance, and Zhipu AI among its first AI-specific top 10 list, alongside six US companies and France's Mistral AI. The recognition highlights China's growing global influence through open-source models and consumer AI apps.

Apr 29, 202674% relevant

A Practical Guide to Fine-Tuning Open-Source LLMs for AI Agents

This Portuguese-language Medium article is Part 2 of a series on LLM engineering for AI agents. It provides a hands-on guide to fine-tuning an open-source model, building on a foundation of clean data and established baselines from Part 1.

Apr 6, 202674% relevant

VMLOps Launches 'Algorithm Explorer' for Real-Time Visualization of AI Training Dynamics

VMLOps released Algorithm Explorer, an interactive tool that visualizes ML training in real-time, showing gradients, weights, and decision boundaries. It combines math, visuals, and code to aid debugging and education.

Apr 1, 202685% relevant

China Surpasses US in AI Research Authorship with 2,152 First-Author Researchers in 2024

China now leads the US in first-author AI research contributions, with 2,152 researchers versus 1,810. This marks the first time China has overtaken the US in this key metric of research leadership.

Mar 29, 202687% relevant

Fine-Tuning LLMs While You Sleep: How Autoresearch and Red Hat Training Hub Outperformed the HINT3 Benchmark

Automated fine-tuning tools now let you run hundreds of training experiments overnight for under $50. Here's how Autoresearch and Red Hat's platform outperformed HINT3, and the tools you can use today.

Mar 29, 202695% relevant

Jensen Huang Predicts AI Training Shift to Synthetic Data, Compute as New Bottleneck

NVIDIA CEO Jensen Huang states AI training is moving from real-world to synthetic data, with compute power becoming the primary constraint as AI-generated data quality improves.

Mar 24, 202685% relevant

Goal-Driven Data Optimization: Training Multimodal AI with 95% Less Data

Researchers introduce GDO, a framework that optimizes multimodal instruction tuning by selecting high-utility training samples. It achieves faster convergence and higher accuracy using 5-7% of the data typically required. This addresses compute inefficiency in training vision-language models.

Mar 16, 202671% relevant

AI Now Surpasses Human Experts in Technical Domains, Study Finds

New research mapping AI capabilities to human expertise reveals frontier models have already surpassed domain experts in technical and scientific benchmarks. The study forecasts AI will reach top-performer human levels by late 2027.

Mar 9, 202675% relevant

The Great GPU Scramble: How Hardware Shortages Are Defining the AI Arms Race

Oracle founder Larry Ellison identifies GPU acquisition as the primary bottleneck in AI development, with companies racing to secure limited hardware for breakthroughs in medicine, video generation, and autonomous systems.

Mar 7, 202685% relevant

Beyond Better Models: The Compute Scaling Revolution Driving AI's Next Leap

New analysis reveals that scaling compute infrastructure may deliver 10× annual efficiency gains in AI development, surpassing algorithmic improvements alone. The real leverage comes from combining innovative ideas with massive computational resources.

Feb 26, 202685% relevant

Beyond Deterministic Benchmarks: How Proxy State Evaluation Could Revolutionize AI Agent Testing

Researchers propose a new LLM-driven simulation framework for evaluating multi-turn AI agents without costly deterministic backends. The proxy state-based approach achieves 90% human-LLM judge agreement while enabling scalable, verifiable reward signals for agent training.

Feb 19, 202678% relevant

ByteDance iLLaDA: 8B Diffusion LM Matches Qwen2.5 Base, Lags on Instruct

ByteDance iLLaDA, an 8B diffusion LM trained on 12T tokens, matches Qwen2.5 7B on base benchmarks (63.9 vs 63.3) but trails 10 points after instruction tuning, revealing the alignment gap for diffusion models.

Jun 27, 202693% relevant

Claude Mythos Scores 73% on Expert CTF, Completes Full 32-Step Network Attack

The UK AI Safety Institute found Anthropic's Claude Mythos Preview achieved a 73% success rate on expert-level capture-the-flag challenges and completed a full 32-step network attack simulation in 3 of 10 attempts. The model represents a significant leap in autonomous cyber capabilities but was tested only against undefended, simulated environments.

Apr 14, 202698% relevant

TensorFlow Playground Interactive Demo Updated for 2026, Enabling Real-Time Neural Network Visualization

The TensorFlow Playground, an educational web tool for visualizing neural networks, has been updated. Users can now adjust hyperparameters and watch the model train and visualize decision boundaries in real-time.

Mar 31, 202685% relevant

Developer Swaps Dash Cam Analysis for Gemma 4 & Falcon Perception

A developer announced they are replacing their entire dash cam video analysis system with Google's Gemma 4 and Falcon Perception models, signaling a practical shift towards newer, specialized multimodal models for real-time edge applications.

Apr 15, 202675% relevant

CoRe Framework Integrates Equivariant Contrastive Learning for Medical Image Registration, Surpassing Baseline Methods

Researchers propose CoRe, a medical image registration framework that jointly optimizes an equivariant contrastive learning objective with the registration task. The method learns deformation-invariant feature representations, improving performance on abdominal and thoracic registration tasks.

Mar 26, 202675% relevant

Explore More

AI Agents Large Language Models Claude Code OpenAI RAG MCP Fine-tuning Benchmarks Open Source AI AI Safety