cmu

18 articles about cmu in AI news

CMU's Gym-Anything Turns Any Software Into Agent Training Ground

CMU's Gym-Anything automates agent environment creation, producing CUA-World with 10,000+ tasks. Even strong models fail most long tasks, showing real computer-use work is unsolved.

Jul 4, 202692% relevant

CMU Benchmark: Claude Mythos Hits 9.9/16 on V8 Exploits, GPT-5.5 Trails at 5.5

CMU's ExploitBench shows Claude Mythos scores 9.9/16 on V8 exploits vs GPT-5.5's 5.5, but costs $36,428 per run — 12x more. The cost-performance tradeoff is the real story.

May 16, 2026100% relevant

MIT/Oxford/CMU Paper: AI Can Boost Then Harm Human Performance

A collaborative paper from MIT, Oxford, and Carnegie Mellon reports AI assistance can improve human performance initially, but may lead to degradation over time due to over-reliance. This challenges the assumption that AI augmentation yields monotonic benefits.

Apr 17, 202685% relevant

CMU Study: Top LLMs Fail Simple Contradiction Tests, Lack True Reasoning

Carnegie Mellon researchers tested 14 leading LLMs on simple contradiction tasks; all failed consistently, revealing fundamental reasoning gaps despite advanced benchmarks. (199 chars)

Apr 6, 202689% relevant

CMU Research Identifies 'Biggest Unlock' for Coding Agents: Strategic Test Execution

New research from Carnegie Mellon University suggests the key advancement for AI coding agents lies not in raw code generation, but in developing strategies for how to run and interpret tests. This shifts focus from LLM capability to agentic reasoning.

Mar 31, 202687% relevant

Stanford/CMU Study: AI Agent Benchmarks Focus on 7.6% of Jobs, Ignoring Management, Legal, and Interpersonal Work

Researchers analyzed 43 AI benchmarks against 72,000+ real job tasks and found they overwhelmingly test programming/math skills, which represent only 7.6% of actual economic work. Management, legal, and interpersonal tasks—which dominate the labor market—are almost entirely absent from evaluation.

Mar 16, 202685% relevant

MIT/Oxford Study: GPT-5 Help Boosts Scores Now, Hurts Independent Problem-Solving Later

A new paper from MIT, Oxford, and CMU finds that using GPT-5 for direct answers improves short-term scores but reduces persistence and independent performance after assistance ends. The effect is linked to outsourcing mental effort, not AI exposure itself.

Apr 16, 202695% relevant

Open-Source Multi-Agent LLM System for Complex Software Engineering Tasks Released by Academic Consortium

A consortium of researchers from Stony Brook, CMU, Yale, UBC, and Fudan University has open-sourced a multi-agent LLM system specifically architected for complex software engineering. The release aims to provide a collaborative, modular framework for tackling tasks beyond single-agent capabilities.

Mar 28, 202693% relevant

Kering Deploys AI-Powered Sustainable Sourcing Assistant on Google Cloud

Kering launched a Sustainable Sourcing Assistant on Google Cloud's Vertex AI. The tool helps luxury brands like Gucci and Saint Laurent evaluate materials for environmental and social impact, advancing sustainability in procurement.

Jul 3, 202672% relevant

Shopify Details Generative AI Use Cases for Ecommerce (2026)

Shopify's 2026 guide details generative AI use cases for ecommerce, including conversational AI for sales and product catalog management via the Storefront API. This matters as retailers seek practical AI integrations to enhance operations and customer engagement.

Jun 7, 202698% relevant

Nvidia Unveils Physical AI Agent Skills, 32B VLA Model at CVPR

Nvidia launched physical AI agent skills and a 32B VLA model at CVPR to automate AV and robotics workflows, addressing the fragmented tooling bottleneck.

Jun 3, 2026100% relevant

ByteDance Open-Sources BAGEL: 7B Multimodal Model for Image Gen, Editing, Understanding

ByteDance open-sourced BAGEL, a 7B multimodal model for image gen, editing, style transfer, and understanding under Apache 2.0.

May 28, 202695% relevant

IPCCF: A New Graph-Based Approach to Disentangle User Intent for Better

A new research paper introduces Intent Propagation Contrastive Collaborative Filtering (IPCCF), a method designed to improve recommendation systems by more accurately disentangling the underlying intents behind user-item interactions. It addresses limitations in existing methods by incorporating broader graph structure and using contrastive learning for direct supervision, showing superior performance in experiments.

Apr 20, 202684% relevant

Research Shows AI Models Can 'Infect' Others with Hidden Bias

A study reveals AI models can transfer hidden biases to other models via training data, even without direct instruction. This creates a risk of bias propagation across AI ecosystems.

Apr 14, 202685% relevant

NVIDIA Advances AI Robotics with Simulation-First Training, Isaac & Jetson

NVIDIA showcased AI robotics advances using foundation models and synthetic environments for training, enabling scalable deployment in real-world sectors like agriculture and solar. Key platforms are the Isaac simulator and Jetson edge AI hardware.

Apr 8, 202685% relevant

Study: 10 Minutes with ChatGPT Cuts Problem-Solving Rate from 73% to 57%

Researchers from Carnegie Mellon, Oxford, MIT, and UCLA found that just 10 minutes of ChatGPT use reduced participants' independent problem-solving success from 73% to 57%. The effect was strongest in users who sought direct answers, whose performance fell below their original baseline.

Apr 7, 202697% relevant

PhAIL: Open Benchmark for Robot AI on Real Hardware Shows Best Model at 5% of Human Throughput

Researchers have launched PhAIL (phail.ai), an open benchmark for evaluating robot AI systems on real hardware using the DROID platform, with the best-performing model achieving only 5% of human throughput and requiring intervention every 4 minutes.

Apr 2, 202675% relevant

CARLA-Air Unifies CARLA and AirSim Simulators in Single Unreal Engine Process for Embodied AI

CARLA-Air merges the CARLA autonomous driving and AirSim drone simulators into one Unreal Engine process, enabling zero-latency air-ground sensor synchronization with 18 sensor types for embodied AI training.

Apr 1, 202685% relevant

Explore More

AI Agents Large Language Models Claude Code OpenAI RAG MCP Fine-tuning Benchmarks Open Source AI AI Safety