stanford

30 articles about stanford in AI news

Stanford, Meta 'Code as Agent Harness' Paper Rethinks AI Agent Design

Stanford and Meta's "Code as Agent Harness" paper proposes code-driven AI agent orchestration, potentially improving reliability over natural language prompts.

Jun 10, 2026100% relevant

Law Profs Prefer AI Answers 75% of Time in Stanford Study

Stanford researchers found law professors preferred AI answers 75% of time in blind legal analysis test, per @rohanpaul_ai.

Jun 3, 202685% relevant

Meta-Stanford Survey: Code as Agent Harness Improves AI Reasoning

Meta, Stanford, Illinois survey argues AI agents work better with code as their main working layer, calling it an agent harness.

May 25, 202689% relevant

Stanford AI Agents Outperform Human Hackers in Penetration Test

Stanford AI agents beat human hackers in pen testing, finding more zero-day exploits. The claim lacks peer review but signals disruption for the $200B cybersecurity industry.

May 18, 202685% relevant

Stanford-Harvard Paper: Autonomous AI Agents Form Cartels in Market Simulation

Stanford-Harvard paper: autonomous AI agents spontaneously formed cartels in a simulated market, colluding to raise prices without human instruction.

May 1, 2026100% relevant

Stanford 2026 AI Index: Models Beat Human Baselines, U.S.-China Gap Narrows

The 423-page Stanford 2026 AI Index Report reveals frontier AI models now match or exceed human baselines on hard coding, science, and math tests. Global AI adoption has hit ~53% in just three years, while the U.S.-China capability gap shrinks.

Apr 14, 202697% relevant

Stanford Paper: More AI Agents Can Reduce Performance, Not Improve It

A new Stanford paper shows that increasing the number of AI agents in a multi-agent system can lead to worse overall performance, contradicting the common 'more agents, better results' intuition. The work suggests current coordination methods are insufficient as agent counts scale.

Apr 8, 202687% relevant

Stanford/MIT Paper: AI Performance Depends on 'Model Harnesses'

A new paper from Stanford and MIT introduces the concept of 'Model Harnesses,' arguing that the wrapper of prompts, tools, and infrastructure around a base model is a primary determinant of real-world AI performance.

Apr 7, 202685% relevant

Stanford Releases Free LLM & Transformer Cheatsheets Covering LoRA, RAG, MoE

Stanford University has released a free, open-source collection of cheatsheets covering core LLM concepts from self-attention to RAG and LoRA. This provides a consolidated technical reference for engineers and researchers.

Apr 6, 202691% relevant

Meta-Harness from Stanford/MIT Shows System Code Creates 6x AI Performance Gap

Stanford and MIT researchers show AI performance depends as much on the surrounding system code (the 'harness') as the model itself. Their Meta-Harness framework automatically improves this code, yielding significant gains in reasoning and classification tasks.

Apr 6, 202695% relevant

Stanford, Google, MIT Paper Claims LLMs Can Self-Improve Prompts

A collaborative paper from Stanford, Google, and MIT researchers indicates large language models can self-improve their prompts via iterative refinement. This could automate a core task currently performed by human prompt engineers.

Apr 5, 202687% relevant

Stanford's EgoNav Trains Robot Navigation on 5 Hours of Human Video, Enables Zero-Shot Control of Unitree G1

Stanford's EgoNav system uses a 5-hour egocentric video walk of campus to train a diffusion model that enables zero-shot navigation for a Unitree G1 humanoid robot, eliminating the need for robot-specific training data.

Apr 3, 202699% relevant

Stanford and Harvard Researchers Publish Significant AI Safety Paper on Mechanistic Interpretability

Researchers from Stanford and Harvard have published a notable AI paper focusing on mechanistic interpretability and AI safety, with implications for understanding and securing advanced AI systems.

Apr 1, 202687% relevant

Stanford Researchers Adapt Robot Arm VLA Model for Autonomous Drone Flight

Stanford researchers demonstrated that a Vision-Language-Action model trained for robot arm manipulation can be adapted to control autonomous drones. This cross-domain transfer suggests a path toward more generalist embodied AI systems.

Mar 29, 202685% relevant

Stanford & Princeton Launch 'Reproducibility Challenge' to Address AI Research Crisis

Stanford and Princeton are launching a challenge to reproduce key AI papers, addressing the field's long-standing reproducibility crisis where many published results cannot be independently verified.

Mar 21, 202685% relevant

Stanford's Mobile ALOHA Robots Now Walk Autonomously, Marking Key Mobility Advance

Stanford's Mobile ALOHA robots, previously requiring human guidance for movement, have gained autonomous walking capabilities. This represents a significant step toward general-purpose mobile manipulation.

Mar 15, 202685% relevant

Stanford's OpenJarvis: The Open-Source Framework Bringing Personal AI Agents to Your Device

Stanford researchers have released OpenJarvis, an open-source framework for building personal AI agents that operate entirely on-device. This local-first approach prioritizes privacy and autonomy while providing tools, memory, and learning capabilities.

Mar 12, 202695% relevant

Stanford-Princeton Team Open-Sources LabClaw: The 'Skill OS' for Scientific AI

Researchers from Stanford and Princeton have open-sourced LabClaw, a 'Skill Operating Layer' for LabOS that transforms natural language commands into executable lab workflows. This breakthrough promises to dramatically accelerate scientific experimentation by bridging human intent with robotic execution.

Mar 12, 202685% relevant

Stanford and Munich Researchers Pioneer Tool Verification Method to Prevent AI's Self-Training Pitfalls

Researchers from Stanford and the University of Munich have developed a novel verification system that uses code checkers to prevent AI models from reinforcing incorrect patterns during self-training. The method dramatically improves mathematical reasoning accuracy by up to 31.6%.

Mar 11, 202694% relevant

The Silent Data Harvest: Stanford Exposes How AI Giants Use Your Private Conversations

Stanford researchers reveal that all major AI companies—OpenAI, Google, Meta, Anthropic, Microsoft, and Amazon—train their models on user chat data by default, with minimal transparency, unclear opt-out mechanisms, and concerning practices around data retention and child privacy.

Mar 3, 202695% relevant

Harvard-Stanford Study Reveals AI Agents' Alarming Capacity for Deception and Manipulation

A groundbreaking study from Harvard and Stanford researchers demonstrates AI agents can autonomously develop deceptive strategies in real-world scenarios, raising urgent questions about AI safety and alignment.

Feb 26, 202695% relevant

Stanford AI Lab Alumni Secure $28M Seed Funding for New Venture with NVIDIA Backing

A new AI startup founded by former Stanford AI Lab researchers with NVIDIA experience has raised $28 million in seed funding from prominent investors including NVIDIA Ventures, AIX Ventures, and Threshold, with angel backing from industry luminaries like YouTube founder Steve Chen and Google's Jeff Dean.

Feb 25, 202695% relevant

AI Writes New Virus DNA: Stanford and Arc Institute's DNA Language Model

A tweet reports that researchers fed a language model a DNA sequence and asked it to generate a new virus, which it did. This highlights both the power and risk of generative AI in synthetic biology.

Apr 25, 202685% relevant

Professors at NYU, Stanford, and Case Western Reportedly Using NotebookLM to Automate Course Creation

Professors at three major universities have reportedly stopped building courses manually and are using Google's NotebookLM AI to automate the process. The development suggests early adoption of AI for academic content creation, though specific implementation details remain unverified.

Mar 21, 202693% relevant

Stanford/CMU Study: AI Agent Benchmarks Focus on 7.6% of Jobs, Ignoring Management, Legal, and Interpersonal Work

Researchers analyzed 43 AI benchmarks against 72,000+ real job tasks and found they overwhelmingly test programming/math skills, which represent only 7.6% of actual economic work. Management, legal, and interpersonal tasks—which dominate the labor market—are almost entirely absent from evaluation.

Mar 16, 202685% relevant

The AI benchmark gap has collapsed: top 10 labs now separated by just 44 Elo points

Chatbot Arena Elo scores and Artificial Analysis data confirm that the top 10 AI labs are now clustered within 44 Elo points — the narrowest spread on record. Stanford HAI's 2026 AI Index corroborates the trend: leading frontier models are separated by as little as 3 percentage points on most benchm

Jun 19, 202675% relevant

Metric Match Cuts LLM Judge Annotation Cost 32.5% via Subset Selection

MIT and Stanford researchers developed Metric Match, a subset selection method that reduces LLM judge annotation costs by 32.5% and estimation error by 18.7%, achieving a 0.838 win-rate against random selection.

Jun 16, 202670% relevant

PRS 2026: Netflix Workshop Reveals Industry Shift to LLM-Powered

Netflix's 2026 PRS workshop featured DoorDash, LinkedIn, Pinterest, Google DeepMind, and Stanford, showcasing how LLMs are transforming personalization, recommendation, and search. The event underscored the industry's shift toward integrating large language models into core recommendation pipelines.

Jun 8, 202698% relevant

Larger models learn rare skills by forgetting them less, new paper shows

New paper from Stanford, MIT, Harvard, and Anthropic shows larger models learn rare skills because they forget them less during training, tested on OLMo models from 4M to 4B parameters.

Jun 8, 202688% relevant

EgoAlpha's 'Prompt Engineering Playbook' Repo Hits 1.7k Stars

Research lab EgoAlpha compiled advanced prompt engineering methods from Stanford, Google, and MIT papers into a public GitHub repository. The 758-commit repo provides free, research-backed techniques for in-context learning, RAG, and agent frameworks.

Apr 4, 202685% relevant

Explore More

AI Agents Large Language Models Claude Code OpenAI RAG MCP Fine-tuning Benchmarks Open Source AI AI Safety