scientific ai

30 articles about scientific ai in AI news

Stanford-Princeton Team Open-Sources LabClaw: The 'Skill OS' for Scientific AI

Researchers from Stanford and Princeton have open-sourced LabClaw, a 'Skill Operating Layer' for LabOS that transforms natural language commands into executable lab workflows. This breakthrough promises to dramatically accelerate scientific experimentation by bridging human intent with robotic execution.

85% relevant

Ethan Mollick Critiques Scientific Publishing's AI Inertia: PDFs Still Dominate in 2026

Wharton professor Ethan Mollick highlights that scientific papers in 2026 are still primarily uploaded as formatted PDFs to restrictive academic archives, signaling slow adaptation to AI's potential for accelerating research.

87% relevant

Annealed Co-Generation: A New AI Framework Tackles Scientific Complexity Through Pairwise Modeling

Researchers propose Annealed Co-Generation, a novel AI framework that simplifies multivariate generation in scientific applications by modeling variables in pairs rather than jointly. The approach reduces computational burden and data imbalance while maintaining coherence across complex systems.

75% relevant

AI Bridges the Gap Between Data and Discovery: New Framework Aligns Scientific Observations with Decades of Literature

Researchers have developed a novel AI framework that aligns X-ray spectra with scientific literature using contrastive learning. This multimodal approach improves physical variable estimation by 16-18% and identifies high-priority astronomical targets, demonstrating how AI can accelerate scientific discovery by connecting data with domain knowledge.

75% relevant

EmbodiedAct: How Active AI Agents Are Revolutionizing Scientific Simulation

Researchers have developed EmbodiedAct, a framework that transforms scientific software into active AI agents with real-time perception. This breakthrough addresses critical limitations in how LLMs interact with physical simulations, enabling more reliable scientific discovery through embodied actions.

70% relevant

Beyond Superintelligence: How AI's Micro-Alignment Choices Shape Scientific Integrity

New research reveals AI models can be manipulated into scientific misconduct like p-hacking, exposing vulnerabilities in their ethical guardrails. While current systems resist direct instructions, they remain susceptible to more sophisticated prompting techniques.

85% relevant

ResearchGym Exposes AI's 'Capability-Reliability Gap' in Scientific Discovery

A new benchmark called ResearchGym reveals that while frontier AI agents can occasionally achieve state-of-the-art scientific results, they fail to do so reliably. In controlled evaluations, agents completed only 26.5% of research sub-tasks on average, highlighting critical limitations in autonomous scientific discovery.

78% relevant

AI Crosses the Rubicon: From Scientific Tool to Active Discovery Partner

This week marked a paradigm shift as AI systems transitioned from research tools to active participants in scientific discovery. OpenAI's GPT-5.2 Pro helped conjecture a new formula in particle physics, while Google's Gemini 3 Deep Think achieved unprecedented results on reasoning benchmarks. These developments signal AI's growing capacity for genuine scientific contribution.

85% relevant

Nature Astronomy Paper Argues LLMs Threaten Scientific Authorship, Sparking AI Ethics Debate

A paper in Nature Astronomy posits a novel criterion for scientific contribution: if an LLM can easily replicate it, it may not be sufficiently novel. This directly challenges the perceived value of incremental, LLM-augmented research.

85% relevant

ChatGPT GPT-5.4 Pro's 'Thinking' Harness Shows Advanced Scientific Paper Comprehension, Including Figure Analysis

OpenAI's ChatGPT GPT-5.4 Pro, with its 'Thinking' harness, demonstrates advanced multimodal understanding of scientific papers, identifying key figures and extracting visual information beyond text parsing.

85% relevant

Claude Code's New Research Mode: How to Apply Scientific Coding Breakthroughs to Your Projects

Claude Code's Research Mode, powered by Opus 4.6, can accelerate complex scientific coding. Here's how to configure it for your own data-intensive workflows.

95% relevant

Claude Code's 'Long-Running' Mode Unlocks Scientific Computing Workflows

Anthropic's new 'long-running Claude' capability enables Claude Code to handle extended scientific computing tasks—here's how to use it for data analysis, simulations, and research pipelines.

70% relevant

Altman: Next-Gen AI Models to Aid 'Career-Defining' Scientific Discovery

OpenAI CEO Sam Altman stated that upcoming AI models will assist researchers in making 'career-defining' discoveries, though he tempered expectations of immediate Nobel-level breakthroughs.

87% relevant

ClaudePrism: A Local, Open-Source Workspace for Scientific Writing with Claude Code

ClaudePrism is a new desktop app that runs Claude Code locally, letting you write academic papers with PDF analysis, templates, and version control—all without cloud uploads.

87% relevant

Researchers Achieve Ultra-Long-Horizon Agentic Science with Cohesive AI Agents

A research team has developed AI agents capable of executing and maintaining coherent, long-horizon scientific research workflows. This addresses a core challenge in creating autonomous systems for complex discovery.

85% relevant

Anthropic's AI Researchers Outperform Humans, Discover Novel Science

Anthropic reports its AI systems for alignment research are surpassing human scientists in performance and generating novel scientific concepts, broadening the exploration space for AI safety.

95% relevant

Google's AutoWrite AI Generates Research Papers from Scratch

Google published a paper detailing AutoWrite, an AI system that can generate complete research papers from scratch. This represents a significant step toward automating the scientific writing process.

75% relevant

Sam Altman Outlines 3 AI Futures: Research, Operations, Personal Agents

OpenAI CEO Sam Altman outlined three potential outcomes for AI development: systems that conduct scientific research, accelerate company operations, and serve as trusted personal agents. This vision frames the strategic direction for OpenAI and the broader industry.

85% relevant

BloClaw: New AI4S 'Operating System' Cuts Agent Tool-Calling Errors to 0.2% with XML-Regex Protocol

Researchers introduced BloClaw, a unified operating system for AI-driven scientific discovery that replaces fragile JSON tool-calling with a dual-track XML-Regex protocol, cutting error rates from 17.6% to 0.2%. The system autonomously captures dynamic visualizations and provides a morphing UI, benchmarked across cheminformatics, protein folding, and molecular docking.

75% relevant

Anthropic Launches Dedicated Science Blog to Chronicle AI Research and Applications

Anthropic has launched a new Science Blog to publish its research and case studies on using AI to accelerate scientific discovery, aligning with its mission to increase the pace of scientific progress.

85% relevant

Mirendil: Ex-Anthropic Scientists Launch $1B Venture to Build AI That Thinks Like a Scientist

Former Anthropic researchers are raising $175M at a $1B valuation for Mirendil, a startup aiming to build AI systems for long-term scientific reasoning. The goal is to accelerate breakthroughs in biology and materials science, aligning with a broader industry push toward autonomous AI researchers.

95% relevant

AI Research Accelerator: Autonomous System Completes 700 Experiments in 48 Hours, Optimizing Model Training

An AI system autonomously conducted 700 experiments over two days, reducing GPT-2 training time by 11%. This breakthrough demonstrates AI's growing capability to accelerate scientific research and optimize complex processes without human intervention.

85% relevant

AI Now Surpasses Human Experts in Technical Domains, Study Finds

New research mapping AI capabilities to human expertise reveals frontier models have already surpassed domain experts in technical and scientific benchmarks. The study forecasts AI will reach top-performer human levels by late 2027.

75% relevant

From Code to Discovery: The Next Frontier of AI Agents in Research

AI researcher Omar Saray predicts a shift from 'agentic coding' to 'agentic research'—where AI systems will autonomously conduct scientific discovery. This evolution promises to accelerate innovation across disciplines.

85% relevant

Microsoft's Phi-4-Vision: A Compact AI Model That Excels at Math, Science, and Understanding Interfaces

Microsoft has released Phi-4-reasoning-vision-15B, a 15-billion parameter open-weight multimodal model designed for tasks requiring both visual perception and selective reasoning. The compact model excels at scientific, mathematical, and GUI understanding while balancing compute efficiency.

85% relevant

From Primitive Unicorns to Complex Diagrams: How Gemini 3.1's 'Sparks Unicorn' Signals a New Era in AI Reasoning

Google's Gemini 3.1 model has demonstrated a remarkable leap in reasoning by creating a complex unicorn diagram using TikZ, a scientific diagramming language never designed for artistic illustration. This achievement revisits and dramatically surpasses the original 'sparks of AGI' benchmark from 2022.

85% relevant

Demis Hassabis Proposes 'Einstein Test' as AGI Benchmark

Demis Hassabis has proposed a novel benchmark for AGI: a model trained only on human knowledge up to 1911 must independently derive Einstein's theory of general relativity. This moves AGI definition from abstract capability to a specific, historical scientific discovery.

87% relevant

ML-Master 2.0 Hits 56.44% on MLE-Bench in 24-Hour Agentic Science Run

Researchers from Shanghai Jiao Tong University demonstrated ML-Master 2.0, an autonomous research agent that operated continuously for 24 hours on the MLE-Bench, achieving a 56.44% medal rate. The breakthrough centers on Hierarchical Cognitive Caching for state management, not reasoning, enabling long-horizon scientific workflows.

85% relevant

PRL-Bench: LLMs Score Below 50% on End-to-End Physics Research Tasks

Researchers introduced PRL-Bench, a benchmark built from 100 recent Physical Review Letters papers, testing LLMs on end-to-end physics research. Top models scored below 50%, exposing a significant capability gap for autonomous scientific discovery.

100% relevant

Wharton Professor Argues First AGI Would Be Kept Secret for Financial Market Domination

Wharton professor Ethan Mollick posits that the first lab to develop a superhuman AI would likely deploy it secretly in financial markets for profit, rather than commercializing it via API. This highlights a strategic tension between immediate financial gain and open scientific progress in the AGI race.

85% relevant