scientific innovation
30 articles about scientific innovation in AI news
Nature Astronomy Paper Argues LLMs Threaten Scientific Authorship, Sparking AI Ethics Debate
A paper in Nature Astronomy posits a novel criterion for scientific contribution: if an LLM can easily replicate it, it may not be sufficiently novel. This directly challenges the perceived value of incremental, LLM-augmented research.
Ethan Mollick Critiques Scientific Publishing's AI Inertia: PDFs Still Dominate in 2026
Wharton professor Ethan Mollick highlights that scientific papers in 2026 are still primarily uploaded as formatted PDFs to restrictive academic archives, signaling slow adaptation to AI's potential for accelerating research.
Annealed Co-Generation: A New AI Framework Tackles Scientific Complexity Through Pairwise Modeling
Researchers propose Annealed Co-Generation, a novel AI framework that simplifies multivariate generation in scientific applications by modeling variables in pairs rather than jointly. The approach reduces computational burden and data imbalance while maintaining coherence across complex systems.
From Code to Discovery: The Next Frontier of AI Agents in Research
AI researcher Omar Saray predicts a shift from 'agentic coding' to 'agentic research'—where AI systems will autonomously conduct scientific discovery. This evolution promises to accelerate innovation across disciplines.
Columbia Prof: LLMs Can't Generate New Science, Only Map Known Data
Columbia CS Professor Vishal Misra argues LLMs cannot generate new scientific ideas because they learn structured maps of known data and fail outside those boundaries. True discovery requires creating new conceptual maps, a capability current architectures lack.
ML-Master 2.0 Hits 56.44% on MLE-Bench in 24-Hour Agentic Science Run
Researchers from Shanghai Jiao Tong University demonstrated ML-Master 2.0, an autonomous research agent that operated continuously for 24 hours on the MLE-Bench, achieving a 56.44% medal rate. The breakthrough centers on Hierarchical Cognitive Caching for state management, not reasoning, enabling long-horizon scientific workflows.
BloClaw: New AI4S 'Operating System' Cuts Agent Tool-Calling Errors to 0.2% with XML-Regex Protocol
Researchers introduced BloClaw, a unified operating system for AI-driven scientific discovery that replaces fragile JSON tool-calling with a dual-track XML-Regex protocol, cutting error rates from 17.6% to 0.2%. The system autonomously captures dynamic visualizations and provides a morphing UI, benchmarked across cheminformatics, protein folding, and molecular docking.
Meta's Hyperagents Enable Self-Referential AI Improvement, Achieving 0.710 Accuracy on Paper Review
Meta researchers introduce Hyperagents, where the self-improvement mechanism itself can be edited. The system autonomously discovered innovations like persistent memory, improving from 0.0 to 0.710 test accuracy on paper review tasks.
Nature Report: China's Public R&D Spending Nears US Levels, Shifting Global Science Funding Landscape
A new Nature report indicates China is close to surpassing the US in public R&D spending. This shift in funding could alter which nation sets the global pace for scientific research, though China still lags in fundamental research output.
Mirendil: Ex-Anthropic Scientists Launch $1B Venture to Build AI That Thinks Like a Scientist
Former Anthropic researchers are raising $175M at a $1B valuation for Mirendil, a startup aiming to build AI systems for long-term scientific reasoning. The goal is to accelerate breakthroughs in biology and materials science, aligning with a broader industry push toward autonomous AI researchers.
AI Research Accelerator: Autonomous System Completes 700 Experiments in 48 Hours, Optimizing Model Training
An AI system autonomously conducted 700 experiments over two days, reducing GPT-2 training time by 11%. This breakthrough demonstrates AI's growing capability to accelerate scientific research and optimize complex processes without human intervention.
Microsoft's Phi-4-Vision: A Compact AI Model That Excels at Math, Science, and Understanding Interfaces
Microsoft has released Phi-4-reasoning-vision-15B, a 15-billion parameter open-weight multimodal model designed for tasks requiring both visual perception and selective reasoning. The compact model excels at scientific, mathematical, and GUI understanding while balancing compute efficiency.
Google Titan: A New Architecture That Could Dethrone Transformers
Google's Titan architecture claims to surpass Transformers on long-context tasks via neural long-term memory, achieving 1.2x-2.5x speedups on benchmarks.
MIT's RLM Handles 10M+ Tokens, Outperforms RAG on Long-Context Benchmarks
MIT researchers introduced Recursive Language Models (RLMs), which treat long documents as an external environment and use code to search, slice, and filter data, achieving 58.00 on a hard long-context benchmark versus 0.04 for standard models.
VoteGCL: A Novel LLM-Augmented Framework to Combat Data Sparsity in
A new paper introduces VoteGCL, a framework that uses few-shot LLM prompting and majority voting to create high-confidence synthetic data for graph-based recommendation systems. It integrates this data via graph contrastive learning to improve accuracy and mitigate bias, outperforming existing baselines.
A Reference Architecture for Agentic Hybrid Retrieval in Dataset Search
A new research paper presents a reference architecture for 'agentic hybrid retrieval' that orchestrates BM25, dense embeddings, and LLM agents to handle underspecified queries against sparse metadata. It introduces offline metadata augmentation and analyzes two architectural styles for quality attributes like governance and performance.
Ethan Mollick: OpenAI's O1 Release Was Second Most Important LLM Launch
Ethan Mollick tweeted that OpenAI's O1 launch was the second most important LLM release after GPT-3.5, featuring a pivotal chart. He expressed surprise that OpenAI disclosed its biggest AI advance rather than keeping it proprietary.
Gur Singh Claims 7 M4 MacBooks Match A100, Calls Cloud GPU Training a 'Scam'
Developer Gur Singh posted that seven M4 MacBooks (2.9 TFLOPS each) match an NVIDIA A100's performance, calling cloud GPU training a 'scam' and advocating for distributed, consumer-hardware approaches.
DOE Seeks Input on AI Infrastructure for Federal Lands
The U.S. Department of Energy has published a Request for Information (RFI) to solicit input on developing AI and high-performance computing infrastructure on DOE-owned lands. This marks a significant step in the federal government's strategy to directly address the national AI compute shortage.
Anthropic Appoints Novartis CEO Vas Narasimhan to Board via Benefit Trust
Anthropic's independent governance body appointed Vas Narasimhan, CEO of pharmaceutical giant Novartis, to its board. This move connects frontier AI development directly with global healthcare leadership.
LABBench2 Benchmark Shows AI Biology Agents Struggle with Real-World Tasks
Researchers introduced LABBench2, a 1,900-task benchmark for AI in biology research. It shows current models perform 26-46% worse on realistic tasks versus simplified ones, exposing a critical capability gap.
Stanford 2026 AI Index: Models Beat Human Baselines, U.S.-China Gap Narrows
The 423-page Stanford 2026 AI Index Report reveals frontier AI models now match or exceed human baselines on hard coding, science, and math tests. Global AI adoption has hit ~53% in just three years, while the U.S.-China capability gap shrinks.
Embedding Matching Distills Genomic Models 200x, Matches mRNA-Bench Performance
A new distillation framework transfers mRNA representations from a large genomic foundation model to a specialized model 200x smaller. It uses embedding-level distillation, outperforming logit-based methods and competing with larger models on mRNA-bench.
MIA Agent Enables 7B Models to Outperform GPT-5.4 on Research Tasks
Researchers introduced MIA, a Manager-Planner-Executor framework that transforms 7B parameter models into active research strategists. The system reportedly outperforms GPT-5.4 through continual learning during task execution.
Anthropic Withholds 'Mythos' AI Model Citing Unspecified Risk Concerns
Anthropic has reportedly chosen to withhold a new AI model, internally called 'Mythos', from public release. The decision is based on an internal assessment of potential risks, though specific capabilities or benchmarks were not disclosed.
OpenAI Publishes 'Intelligence Age' Policy Blueprint for Superintelligence Transition
OpenAI published a policy blueprint outlining governance and economic proposals for the 'Intelligence Age,' framing superintelligence as an active transition requiring new safety nets and international coordination.
ASI-Evolve: This AI Designs Better AI Than Humans Can — 105 New Architectures, Zero Human Guidance
Researchers built an AI that runs the entire research cycle on its own — reading papers, designing experiments, running them, and learning from results. It discovered 105 architectures that beat human-designed models, and invented new learning algorithms. Open-sourced.
AI Forecasters Revise AGI Timeline: Key Milestones Pulled Forward to 2029-2030 After Recent Model Progress
A significant update from AI forecasters indicates key AGI milestones have been pulled forward, with the median prediction for AGI arrival shifting from 2032 to 2029-2030. This revision follows rapid progress in recent model capabilities, particularly in reasoning and tool use.
Sam Altman Hints at OpenAI Acquisition Targeting 'Mixture' of Product Company and Research Lab
In an interview, OpenAI CEO Sam Altman indicated the company is considering an acquisition that looks like 'a mixture' of both a product company and a research lab. This suggests a strategic move to acquire teams that can both advance AI capabilities and rapidly productize them.
Agent Psychometrics: New Framework Predicts Task-Level Success in Agentic Coding Benchmarks with 0.81 AUC
A new research paper introduces a framework using Item Response Theory and task features to predict success on individual agentic coding tasks, achieving 0.81 AUC. This enables benchmark designers to calibrate difficulty without expensive evaluations.