research

30 articles about research in AI news

PhD Researcher Replaces Notion & Email Tools with AI Agent 'Muse'

A researcher has reportedly replaced multiple productivity tools (Notion, note-taking apps, inbox triage) with a custom AI agent named 'Muse'. This highlights a growing trend of using specialized AI agents to consolidate workflows.

87% relevant

OpenAI President Teases 'Spud' Model, Two Years of Research

OpenAI President Greg Brockman briefly mentioned an upcoming model codenamed 'Spud', stating it represents 'two years worth of research that is coming to fruition.' No technical details or release timeline were provided.

85% relevant

Sam Altman Outlines 3 AI Futures: Research, Operations, Personal Agents

OpenAI CEO Sam Altman outlined three potential outcomes for AI development: systems that conduct scientific research, accelerate company operations, and serve as trusted personal agents. This vision frames the strategic direction for OpenAI and the broader industry.

85% relevant

AI Research Loop Paper Claims Automated Experimentation Can Accelerate AI Development

A shared paper highlights research into using AI to run a mostly automated loop of experiments, suggesting a method to speed up AI research itself. The source notes a potential problem with the approach but does not specify details.

85% relevant

New Research Paper Identifies Multi-Tool Coordination as Critical Failure Point for AI Agents

A new research paper posits that the primary failure mode for AI agents is not in calling individual tools, but in reliably coordinating sequences of many tools over extended tasks. This reframes the core challenge from single-step execution to multi-step orchestration and state management.

85% relevant

ASI-Evolve Automates AI Research Loop, Discovers 105 Better Linear Attention Designs and Boosts AMC32 Scores by 12.5 Points

Researchers developed ASI-Evolve, an AI system that automates experimental loops in AI research. It discovered 105 improved linear attention variants and boosted AMC32 scores by 12.5 points, demonstrating automated research acceleration.

95% relevant

OpenAI Reallocates Compute and Talent Toward 'Automated Researchers' and Agent Systems

OpenAI is reallocating significant compute resources and engineering talent toward developing 'automated researchers' and agent-based systems capable of executing complex tasks end-to-end, signaling a strategic pivot away from some existing projects.

89% relevant

Andrej Karpathy's Personal Knowledge Management System Uses LLM Embeddings Without RAG for 400K-Word Research Base

AI researcher Andrej Karpathy has developed a personal knowledge management system that processes 400,000 words of research notes using LLM embeddings rather than traditional RAG architecture. The system enables semantic search, summarization, and content generation directly from his Obsidian vault.

91% relevant

New Research: Fine-Tuned LLMs Outperform GPT-5 for Probabilistic Supply Chain Forecasting

Researchers introduced an end-to-end framework that fine-tunes large language models (LLMs) to produce calibrated probabilistic forecasts of supply chain disruptions. The model, trained on realized outcomes, significantly outperforms strong baselines like GPT-5 on accuracy, calibration, and precision. This suggests a pathway for creating domain-specific forecasting models that generate actionable, decision-ready signals.

80% relevant

Sam Altman Hints at OpenAI Acquisition Targeting 'Mixture' of Product Company and Research Lab

In an interview, OpenAI CEO Sam Altman indicated the company is considering an acquisition that looks like 'a mixture' of both a product company and a research lab. This suggests a strategic move to acquire teams that can both advance AI capabilities and rapidly productize them.

93% relevant

Agentic AI Systems Failing in Production: New Research Reveals Benchmark Gaps

New research reveals that agentic AI systems are failing in production environments in ways not captured by current benchmarks, including alignment drift and context loss during handoffs between agents.

87% relevant

Google Research Publishes TurboQuant Paper, Claiming 80% AI Cost Reduction

Google Research has published a technical paper introducing TurboQuant, a new AI model quantization method that reportedly reduces memory usage by 6x and could cut AI inference costs by 80%. The research suggests significant implications for AI infrastructure economics and hardware investment strategies.

85% relevant

Stanford and Harvard Researchers Publish Significant AI Safety Paper on Mechanistic Interpretability

Researchers from Stanford and Harvard have published a notable AI paper focusing on mechanistic interpretability and AI safety, with implications for understanding and securing advanced AI systems.

87% relevant

Microsoft Copilot Researcher Adopts Two-Model System: OpenAI GPT Drafts, Anthropic Claude Audits

Microsoft has restructured its Copilot Researcher agent into a two-model system, using OpenAI's GPT for drafting and Anthropic's Claude for auditing. This hybrid approach aims to improve accuracy by separating generation from verification.

85% relevant

Stop Using Elaborate Personas: Research Shows They Degrade Claude Code Output

Scientific research reveals common Claude Code prompting practices—like elaborate personas and multi-agent teams—are measurably wrong and hurt performance.

100% relevant

AI Researcher Kimmonismus Predicts AGI Within 6-12 Months, Widespread Worker Replacement in 1-2 Years

Independent AI researcher Kimmonismus predicts AGI will arrive within 6-12 months, with widespread worker displacement following in 1-2 years. The forecast, shared on X, adds to a growing chorus of near-term AGI predictions from industry figures.

85% relevant

New Research Proposes a Training-Free Method to Estimate Accuracy Limits for Sequential Recommenders

Researchers propose an entropy-based, model-agnostic estimator to quantify the intrinsic accuracy ceiling of sequential recommendation tasks. This allows teams to assess dataset difficulty and potential model headroom before development, and can guide data-centric decisions like user stratification.

98% relevant

CMU Research Identifies 'Biggest Unlock' for Coding Agents: Strategic Test Execution

New research from Carnegie Mellon University suggests the key advancement for AI coding agents lies not in raw code generation, but in developing strategies for how to run and interpret tests. This shifts focus from LLM capability to agentic reasoning.

87% relevant

Open-Sourced 'AI Investment Team' Agent Framework Released for Stock Research and Portfolio Management

An anonymous developer has open-sourced a multi-agent AI framework designed to automate stock research, market analysis, and portfolio management. The release adds to a growing trend of specialized, open-source financial AI tools.

91% relevant

New Research Proposes FilterRAG and ML-FilterRAG to Defend Against Knowledge Poisoning Attacks in RAG Systems

Researchers propose two novel defense methods, FilterRAG and ML-FilterRAG, to mitigate 'PoisonedRAG' attacks where adversaries inject malicious texts into a knowledge source to manipulate an LLM's output. The defenses identify and filter adversarial content, maintaining performance close to clean RAG systems.

92% relevant

Research: Cheaper Reasoning Models Can Cost 3x More Due to Higher Error Rates and Retry Loops

New research indicates that selecting AI models based solely on per-token pricing can be a false economy. Models with lower accuracy often require multiple expensive retries, ultimately increasing total costs by up to 300%.

87% relevant

Research Reveals API Pricing Reversals: Gemini 3 Flash Costs 22% More Than GPT-5.2 Despite 78% Cheaper List Price

New research shows 21.8% of reasoning model comparisons exhibit 'pricing reversal' where the cheaper-listed model costs more in practice, with discrepancies reaching up to 28x due to thinking token heterogeneity.

95% relevant

Stanford Researchers Adapt Robot Arm VLA Model for Autonomous Drone Flight

Stanford researchers demonstrated that a Vision-Language-Action model trained for robot arm manipulation can be adapted to control autonomous drones. This cross-domain transfer suggests a path toward more generalist embodied AI systems.

85% relevant

China Surpasses US in AI Research Authorship with 2,152 First-Author Researchers in 2024

China now leads the US in first-author AI research contributions, with 2,152 researchers versus 1,810. This marks the first time China has overtaken the US in this key metric of research leadership.

87% relevant

Fine-Tuning LLMs While You Sleep: How Autoresearch and Red Hat Training Hub Outperformed the HINT3 Benchmark

Automated fine-tuning tools now let you run hundreds of training experiments overnight for under $50. Here's how Autoresearch and Red Hat's platform outperformed HINT3, and the tools you can use today.

100% relevant

Researchers Train LLM from Scratch on 28,000 Victorian-Era Texts, Creating Historical Dialogue AI

Researchers have created a specialized LLM trained exclusively on 28,000 British texts from 1837-1899, enabling historically accurate Victorian-era dialogue generation. Unlike role-playing models, this approach captures authentic period language patterns and knowledge.

87% relevant

Claude Code's New Cybersecurity Guardrails: How to Keep Your Security Research Flowing

Claude Opus 4.6 is now aggressively blocking cybersecurity prompts. Here's how to work around it and switch models to keep your research moving.

100% relevant

Substack MCP Plus: Research Any Niche Before You Write a Line of Code

Install the Substack MCP server to turn Claude Code into a research assistant that analyzes publications, discovers writers, and inspects posts to validate project ideas.

70% relevant

Ex-OpenAI Researcher Daniel Kokotajlo Puts 70% Probability on AI-Caused Human Extinction by 2029

Former OpenAI governance researcher Daniel Kokotajlo publicly estimates a 70% chance of AI leading to human extinction within approximately five years. The claim, made in a recent interview, adds a stark numerical prediction to ongoing AI safety debates.

87% relevant

Google Researchers Challenge Singularity Narrative: Intelligence Emerges from Social Systems, Not Individual Minds

Google researchers argue AI's intelligence explosion will be social, not individual, observing frontier models like DeepSeek-R1 spontaneously develop internal 'societies of thought.' This reframes scaling strategy from bigger models to richer multi-agent systems.

87% relevant