scientific method

30 articles about scientific method in AI news

Google Launches PaperBanana AI to Format Raw Methods into Publication Text

Google has launched PaperBanana, an AI tool designed to transform unstructured methodology notes into polished, publication-ready text. This targets a key bottleneck in academic writing, automating the formatting and structuring of methods sections.

Apr 16, 202687% relevant

Nature Astronomy Paper Argues LLMs Threaten Scientific Authorship, Sparking AI Ethics Debate

A paper in Nature Astronomy posits a novel criterion for scientific contribution: if an LLM can easily replicate it, it may not be sufficiently novel. This directly challenges the perceived value of incremental, LLM-augmented research.

Apr 4, 202685% relevant

Meta's QTT Method Fixes Long-Context LLM 'Buried Facts' Problem, Boosts Retrieval Accuracy

Meta researchers identified a failure mode where LLMs with 128K+ context windows miss information buried in the middle of documents. Their Query-only Test-Time Training (QTT) method adapts models at inference, significantly improving retrieval accuracy.

Mar 31, 202685% relevant

Ethan Mollick Critiques Scientific Publishing's AI Inertia: PDFs Still Dominate in 2026

Wharton professor Ethan Mollick highlights that scientific papers in 2026 are still primarily uploaded as formatted PDFs to restrictive academic archives, signaling slow adaptation to AI's potential for accelerating research.

Mar 31, 202687% relevant

ChatGPT GPT-5.4 Pro's 'Thinking' Harness Shows Advanced Scientific Paper Comprehension, Including Figure Analysis

OpenAI's ChatGPT GPT-5.4 Pro, with its 'Thinking' harness, demonstrates advanced multimodal understanding of scientific papers, identifying key figures and extracting visual information beyond text parsing.

Mar 30, 202685% relevant

Claude Code's New Research Mode: How to Apply Scientific Coding Breakthroughs to Your Projects

Claude Code's Research Mode, powered by Opus 4.6, can accelerate complex scientific coding. Here's how to configure it for your own data-intensive workflows.

Mar 24, 202695% relevant

Stanford and Munich Researchers Pioneer Tool Verification Method to Prevent AI's Self-Training Pitfalls

Researchers from Stanford and the University of Munich have developed a novel verification system that uses code checkers to prevent AI models from reinforcing incorrect patterns during self-training. The method dramatically improves mathematical reasoning accuracy by up to 31.6%.

Mar 11, 202694% relevant

Annealed Co-Generation: A New AI Framework Tackles Scientific Complexity Through Pairwise Modeling

Researchers propose Annealed Co-Generation, a novel AI framework that simplifies multivariate generation in scientific applications by modeling variables in pairs rather than jointly. The approach reduces computational burden and data imbalance while maintaining coherence across complex systems.

Mar 10, 202675% relevant

AI Bridges the Gap Between Data and Discovery: New Framework Aligns Scientific Observations with Decades of Literature

Researchers have developed a novel AI framework that aligns X-ray spectra with scientific literature using contrastive learning. This multimodal approach improves physical variable estimation by 16-18% and identifies high-priority astronomical targets, demonstrating how AI can accelerate scientific discovery by connecting data with domain knowledge.

Mar 6, 202675% relevant

EmbodiedAct: How Active AI Agents Are Revolutionizing Scientific Simulation

Researchers have developed EmbodiedAct, a framework that transforms scientific software into active AI agents with real-time perception. This breakthrough addresses critical limitations in how LLMs interact with physical simulations, enabling more reliable scientific discovery through embodied actions.

Feb 25, 202670% relevant

AI's Causal Reasoning Gap: New Method Tests How Well Models Understand 'What If' Scenarios

Researchers introduce Double Counterfactual Consistency (DCC), a training-free method to evaluate and improve LLMs' causal reasoning. The technique reveals significant weaknesses in how models handle hypothetical scenarios and counterfactual thinking, addressing a critical limitation in current AI systems.

Feb 20, 202675% relevant

ResearchGym Exposes AI's 'Capability-Reliability Gap' in Scientific Discovery

A new benchmark called ResearchGym reveals that while frontier AI agents can occasionally achieve state-of-the-art scientific results, they fail to do so reliably. In controlled evaluations, agents completed only 26.5% of research sub-tasks on average, highlighting critical limitations in autonomous scientific discovery.

Feb 18, 202678% relevant

AI Crosses the Rubicon: From Scientific Tool to Active Discovery Partner

This week marked a paradigm shift as AI systems transitioned from research tools to active participants in scientific discovery. OpenAI's GPT-5.2 Pro helped conjecture a new formula in particle physics, while Google's Gemini 3 Deep Think achieved unprecedented results on reasoning benchmarks. These developments signal AI's growing capacity for genuine scientific contribution.

Feb 17, 202685% relevant

LANL Taps NVIDIA Vera CPUs for 7x Agentic AI Speed on Scientific Workloads

LANL selects NVIDIA Vera CPUs for three supercomputers, claiming 7x performance on agentic AI workloads over x86. Systems Mission, Vision, Veritas deploy by 2027.

Jun 22, 202698% relevant

Frozen Giants Aligned: New AI Method Bridges Vision and Language Without Training

Researchers have developed HDFLIM, a novel framework that aligns powerful frozen vision and language models using hyperdimensional computing. This approach enables efficient image captioning without computationally intensive fine-tuning, preserving original model capabilities while creating cross-modal understanding.

Mar 2, 202675% relevant

AI Gets a Confidence Meter: New Method Tackles LLM Hallucinations in Interpretable Models

Researchers propose an uncertainty-aware framework for Concept Bottleneck Models that quantifies and incorporates the reliability of LLM-generated concept labels, addressing critical hallucination risks while maintaining model interpretability.

Mar 2, 202680% relevant

Teaching AI to Think Before It Speaks: New Method Boosts Reasoning Stability

Researchers have developed Metacognitive Behavioral Tuning (MBT), a framework that teaches large language models human-like self-regulation during complex reasoning. This approach addresses the 'reasoning collapse' phenomenon where models fail despite correct intermediate steps, achieving higher accuracy with fewer computational resources.

Feb 27, 202680% relevant

Claude Opus 4.7 Matches Dedicated NMR Software on Chemistry Tasks

Claude Opus 4.7 matches NMR software on chemistry tasks per Anthropic blog, but methodology and benchmarks undisclosed.

Jun 5, 202694% relevant

PRL-Bench: LLMs Score Below 50% on End-to-End Physics Research Tasks

Researchers introduced PRL-Bench, a benchmark built from 100 recent Physical Review Letters papers, testing LLMs on end-to-end physics research. Top models scored below 50%, exposing a significant capability gap for autonomous scientific discovery.

Apr 20, 2026100% relevant

Researchers Achieve Ultra-Long-Horizon Agentic Science with Cohesive AI Agents

A research team has developed AI agents capable of executing and maintaining coherent, long-horizon scientific research workflows. This addresses a core challenge in creating autonomous systems for complex discovery.

Apr 20, 202685% relevant

OpenAI Quietly Phasing Out MRCR Benchmark in Claude Evaluations

An OpenAI engineer confirmed the company is phasing out the MRCR benchmark from Claude's system card, citing its poor correlation with real-world performance and high evaluation cost. This reflects a broader industry move toward more practical, cost-effective evaluation methods.

Apr 16, 202675% relevant

Google's PaperBanana AI Generates Academic Diagrams, Beats Human Designs 3:1

Google released PaperBanana, an AI system that transforms raw methodology text into publication-ready academic diagrams using a 5-agent creative pipeline. In blind evaluations, humans preferred its outputs nearly 3 out of 4 times over manually designed figures.

Apr 16, 202695% relevant

Anthropic's AI Researchers Outperform Humans, Discover Novel Science

Anthropic reports its AI systems for alignment research are surpassing human scientists in performance and generating novel scientific concepts, broadening the exploration space for AI safety.

Apr 14, 202695% relevant

Embedding Matching Distills Genomic Models 200x, Matches mRNA-Bench Performance

A new distillation framework transfers mRNA representations from a large genomic foundation model to a specialized model 200x smaller. It uses embedding-level distillation, outperforming logit-based methods and competing with larger models on mRNA-bench.

Apr 13, 202686% relevant

Google's AutoWrite AI Generates Research Papers from Scratch

Google published a paper detailing AutoWrite, an AI system that can generate complete research papers from scratch. This represents a significant step toward automating the scientific writing process.

Apr 8, 202675% relevant

Study of 1,222 Users Claims ChatGPT Use Reduces Cognitive Effort

A viral social media post references a study of 1,222 people, claiming it proves ChatGPT use reduces cognitive effort. The claim lacks published methodology or data, highlighting the ongoing debate over AI's impact on human cognition.

Apr 7, 202687% relevant

AI Research Loop Paper Claims Automated Experimentation Can Accelerate AI Development

A shared paper highlights research into using AI to run a mostly automated loop of experiments, suggesting a method to speed up AI research itself. The source notes a potential problem with the approach but does not specify details.

Apr 4, 202685% relevant

Mercor Data Breach Exposes Expert Human Annotation Pipeline Used by Frontier AI Labs

Hackers have reportedly accessed Mercor's expert human data collection systems, which are used by leading AI labs to build foundation models. This breach could expose proprietary training methodologies and sensitive model development data.

Apr 1, 202691% relevant

Diffusion Recommender Models Fail Reproducibility Test: Study Finds 'Illusion of Progress' in Top-N Recommendation Research

A reproducibility study of nine recent diffusion-based recommender models finds only 25% of reported results are reproducible. Well-tuned simpler baselines outperform the complex models, revealing a conceptual mismatch and widespread methodological flaws in the field.

Mar 30, 202682% relevant

Kyushu University AI Model Achieves 44.4% Solar Cell Efficiency, Surpassing Theoretical SQ Limit

Researchers at Kyushu University used an AI-driven inverse design method to create a photonic crystal solar cell with 44.4% efficiency, exceeding the 33.7% Shockley-Queisser limit for single-junction cells.

Mar 28, 202685% relevant

Explore More

AI Agents Large Language Models Claude Code OpenAI RAG MCP Fine-tuning Benchmarks Open Source AI AI Safety