research
30 articles about research in AI news
OpenAI Agents Now Ask Questions Good Enough for Research Papers
Sébastien Bubeck revealed on the OpenAI Podcast that internal AI agents now ask research questions so insightful they're inspiring papers and correcting published mistakes, with a 1-2 year timeline for full researcher-level capabilities.
Google Launches Deep Research Max Agent on Gemini 3.1 Pro
Google DeepMind rolled out Deep Research Max and standard Deep Research agents on Gemini 3.1 Pro, enabling autonomous web and proprietary data research via the Gemini API. The Max variant uses extended test-time compute for thorough asynchronous reports.
AI Agents Now Training Other AI Models, Sparking Autoresearch Trend
AI agents are now being used to train other AI models, creating advanced agentic systems. This development stems from Andrej Karpathy's autoresearch repository and represents early-stage automation of AI research.
New Research Models 'Exploration Saturation' in Recommender Systems
A research paper analyzes 'exploration saturation'—the point where more diverse recommendations hurt user utility. Findings show this saturation point is user-dependent, challenging the standard practice of applying uniform fairness or novelty pressure across all users.
NVIDIA Research Shows AI Can Optimize Decades-Old EDA Tools Like ABC
New NVIDIA research indicates AI can be used to optimize Electronic Design Automation (EDA) tools, such as the classic ABC system, which have been manually tuned by engineers for decades. This could automate a core, labor-intensive bottleneck in semiconductor design.
Anthropic Launches STEM Fellows Program to Pair Experts with AI Research
Anthropic announced the Anthropic STEM Fellows Program, a new initiative to bring science and engineering experts into its research teams for collaborative, months-long projects aimed at accelerating progress with AI.
Codex 'Chronicle' Research Preview Adds Memory for Daily Developer Context
A research preview of 'Chronicle' for Codex has been released. It enables the AI coding assistant to accumulate memories from a developer's daily workflow to improve context.
PRL-Bench: LLMs Score Below 50% on End-to-End Physics Research Tasks
Researchers introduced PRL-Bench, a benchmark built from 100 recent Physical Review Letters papers, testing LLMs on end-to-end physics research. Top models scored below 50%, exposing a significant capability gap for autonomous scientific discovery.
Researchers Achieve Ultra-Long-Horizon Agentic Science with Cohesive AI Agents
A research team has developed AI agents capable of executing and maintaining coherent, long-horizon scientific research workflows. This addresses a core challenge in creating autonomous systems for complex discovery.
Prince Canuma's M3 Ultra 512GB & RTX Pro 6000 Setup for MLX Research
Independent developer Prince Canuma has assembled a powerful, community-sponsored home compute cluster for MLX research and model porting, featuring an M3 Ultra with 512GB RAM and an RTX Pro 6000.
Google DeepMind Researcher: LLMs Can Never Achieve Consciousness
A Google DeepMind researcher has publicly argued that large language models, by their algorithmic nature, can never become conscious, regardless of scale or time. This stance challenges a core speculative narrative in AI discourse.
HUOZIIME: A Research Framework for On-Device LLM-Powered Input Methods
A new research paper introduces HUOZIIME, a personalized on-device input method powered by a lightweight LLM. It uses a hierarchical memory mechanism to capture user-specific input history, enabling privacy-preserving, real-time text generation tailored to individual writing styles.
MiniMax Launches MaxHermes, Cloud-Hosted Agent with NousResearch
MiniMax has launched MaxHermes, a cloud-hosted version of the Hermes agent framework, in partnership with NousResearch. This provides a managed service for users of MiniMax's M2.7 model, aiming to simplify agent deployment.
New Research Proposes Collaborative Contrastive Network for Generalizable
Researchers propose the Collaborative Contrastive Network (CCN) to solve Trigger-Induced Recommendation challenges in ephemeral e-commerce scenarios like Black Friday. Instead of modeling ambiguous intent, CCN learns context-specific preferences from user-trigger pairs via novel contrastive signals. In online A/B tests on Taobao, CCN increased CTR by 12.3% and order volume by 12.7% in unseen scenarios.
New Research Proposes Lightweight Method to Fix Stale Semantic IDs in
Researchers propose a method to update 'stale' Semantic IDs in generative retrieval systems without full retraining. Their alignment technique improves key metrics and reduces compute costs by ~8-9x, addressing a core challenge in dynamic recommendation environments.
Shopify Engineering Teases 'Autoresearch' Beyond Model Training in 2026 Preview
Shopify Engineering has previewed a 2026 perspective suggesting 'autoresearch'—automated research processes—will have applications extending beyond just training AI models. This signals a broader operational automation strategy for the e-commerce giant.
AI Research Suggests Whale 'Vowels' in Sperm Whale Communication
AI researchers analyzing sperm whale vocalizations have identified combinatorial structures that function like vowels, marking a step toward decoding cetacean communication.
Tsinghua Researchers Diagnose On-Policy Distillation Failures, Propose Fixes
Researchers from Tsinghua University have pinpointed two necessary conditions for successful on-policy distillation: compatible thinking patterns and novel teacher capabilities. They propose two recovery methods to salvage failing distillation runs.
New Research Proposes Profiler and DAVINCI for Scalable
Researchers propose Profiler, a non-learnable module to efficiently capture human citation patterns, and DAVINCI, a reranking model that integrates these patterns with semantic data. They also introduce a strict inductive evaluation setting to better simulate real-world recommendation scenarios, achieving state-of-the-art results.
Anthropic's AI Researchers Outperform Humans, Discover Novel Science
Anthropic reports its AI systems for alignment research are surpassing human scientists in performance and generating novel scientific concepts, broadening the exploration space for AI safety.
AI Agent Research Faces Human Evaluation Bottleneck
A prominent AI researcher argues that human-based evaluation is fundamentally flawed for testing autonomous AI agents, as humans cannot perceive or replicate agent logic, creating a major research bottleneck.
Researchers Study AI Mental Health Risks Using Simulated Teen 'Bridget'
A research team created a ChatGPT account for a simulated 13-year-old girl named 'Bridget' to study AI interaction risks with depressed, lonely teens. The experiment underscores urgent safety and ethical questions for generative AI developers.
Google DeepMind Hires Philosopher Henry Shevlin for AI Consciousness Research
Google DeepMind has hired philosopher Henry Shevlin to treat machine consciousness as a live research problem, focusing on AI inner states, human-AI relations, and governance. This marks a strategic pivot toward understanding what advanced AI systems might become, not just what they can do.
New Research Proposes DITaR Method to Defend Sequential Recommenders
Researchers propose DITaR, a dual-view method to detect and rectify harmful fake orders embedded in user sequences. It aims to protect recommendation integrity while preserving useful data, showing superior performance in experiments. This addresses a critical vulnerability in e-commerce and retail AI systems.
MIA Agent Enables 7B Models to Outperform GPT-5.4 on Research Tasks
Researchers introduced MIA, a Manager-Planner-Executor framework that transforms 7B parameter models into active research strategists. The system reportedly outperforms GPT-5.4 through continual learning during task execution.
PetClaw AI Agent Automates Research Stack, Replaces $200/Month Tools
A developer claims PetClaw's desktop AI agent automated their entire research workflow—browsing, sourcing, dashboard building—and saved it as a reusable skill, replacing multiple paid tools. No code was written.
New Research: How Online Marketplaces Can Use Demand Allocation to Control Seller Inventory
Researchers propose a model where a marketplace platform, by controlling the timing and predictability of order allocation to sellers, can influence their safety-stock inventory and their choice to use platform fulfillment services. This identifies demand allocation as a key operational lever for digital marketplaces.
Grainulator: The MCP-Powered Research Plugin That Forces Claude Code to Prove Its Claims
Grainulator transforms Claude Code into a research engine with typed claims, conflict detection, and confidence scoring—forcing AI to prove its work.
Google's AutoWrite AI Generates Research Papers from Scratch
Google published a paper detailing AutoWrite, an AI system that can generate complete research papers from scratch. This represents a significant step toward automating the scientific writing process.
Coresight Research Report: Technology and Resilience as Path to Stronger Retail Margins
Coresight Research has published a report titled 'Supply Chain Insights for Food, Drug and Mass Retail: Technology, Resilience and the Path to Stronger Margins.' The research focuses on how strategic tech adoption can fortify operations and profitability in key retail segments.