theory
30 articles about theory in AI news
Palantir's Alex Karp Weaponizes Critical Theory to Sell AI Ontology
A critique argues Palantir CEO Alex Karp deliberately misapplies Frankfurt School critical theory to market his company's AI platforms to governments, turning philosophical critique into a sales tool for surveillance technology.
Exploration Space Theory: A Formal Framework for Prerequisite-Aware Recommendation Systems
Researchers propose Exploration Space Theory (EST), a lattice-theoretic framework for modeling prerequisite dependencies in location-based recommendations. It provides structural guarantees and validity certificates for next-step suggestions, with potential applications beyond tourism.
When AI Agents Need to Read Minds: The Complex Reality of Theory of Mind in Multi-LLM Systems
New research reveals that adding Theory of Mind capabilities to multi-agent AI systems doesn't guarantee better coordination. The effectiveness depends on underlying LLM capabilities, creating complex interdependencies in collaborative decision-making.
Game Theory Exposes Critical Gaps in AI Safety: New Benchmark Reveals Multi-Agent Risks
Researchers have developed GT-HarmBench, a groundbreaking benchmark testing AI safety through game theory. The study reveals frontier models choose socially beneficial actions only 62% of time in multi-agent scenarios, highlighting significant coordination risks.
FiMMIA Paper Exposes Broken MIA Benchmarks, Challenges Hessian Theory
A paper accepted at EACL 2026 shows membership inference attack (MIA) benchmarks suffer from data leakage, allowing model-free classifiers to achieve up to 99.9% AUC. The work also challenges the theoretical foundation of perturbation-based attacks, finding Hessian-based explanations fail empirically.
Ethan Mollick: AI Bottleneck Theory Explains Sudden Capability Jumps
Wharton professor Ethan Mollick posits that incremental AI improvements can cause sudden, large jumps in practical ability when they remove a critical bottleneck in a workflow. This explains why progress often appears non-linear.
Researchers Apply Distributed Systems Theory to LLM Teams, Revealing O(n²) Communication Bottlenecks
A new paper applies decades-old distributed computing principles to LLM multi-agent systems, finding identical coordination problems: O(n²) communication bottlenecks, straggler delays, and consistency conflicts.
Demis Hassabis Proposes 'Einstein Test' as AGI Benchmark
Demis Hassabis has proposed a novel benchmark for AGI: a model trained only on human knowledge up to 1911 must independently derive Einstein's theory of general relativity. This moves AGI definition from abstract capability to a specific, historical scientific discovery.
Agent Psychometrics: New Framework Predicts Task-Level Success in Agentic Coding Benchmarks with 0.81 AUC
A new research paper introduces a framework using Item Response Theory and task features to predict success on individual agentic coding tasks, achieving 0.81 AUC. This enables benchmark designers to calibrate difficulty without expensive evaluations.
GitHub Repository 'Math Textbooks' Aggregates Hundreds of Free University-Level Math Texts
An unmaintained GitHub repository has compiled links to hundreds of free, legally-hosted math textbooks from universities like MIT, Harvard, and Stanford. The collection spans from undergraduate calculus to graduate-level quantum field theory.
Terence Tao: LLM Math is Simple Undergraduate Linear Algebra, But Why They Work Remains a Mystery
Fields Medalist Terence Tao explains that the mathematics to build and run LLMs is straightforward linear algebra. The real puzzle is why they perform unpredictably across tasks, a gap in theory for 'meso-scale' natural data.
New Research Proposes 'Level-2 Inverse Games' to Infer Agents' Conflicting Beliefs About Each Other
MIT researchers propose a 'level-2' inverse game theory framework to infer what each agent believes about other agents' objectives, addressing limitations of current methods that assume perfect knowledge. This has implications for modeling complex multi-agent interactions.
OrbEvo: How AI is Revolutionizing Quantum Chemistry Simulations
Researchers have developed OrbEvo, an equivariant graph transformer that predicts quantum wavefunction evolution in molecules, potentially accelerating time-dependent density functional theory simulations by orders of magnitude. The system accurately captures excited state dynamics and optical properties while maintaining physical symmetries.
Bridging the Gap: New RL Method Delivers Stability Guarantees with Finite Data
Researchers have developed a novel reinforcement learning approach that provides probabilistic stability guarantees using only finite data samples. The method leverages Lyapunov stability theory to ensure control systems remain stable during learning, addressing a critical challenge in deploying RL for real-world applications.
The Human Bottleneck: Why AI Can't Outgrow Our Limitations
New research reveals that persistent errors in AI systems stem not from insufficient scale, but from fundamental limitations in human supervision itself. The study presents a unified theory showing human feedback creates an inescapable 'error floor' that scaling alone cannot overcome.
Logitext Bridges the Gap Between Language Models and Logical Reasoning
Researchers introduce Logitext, a neurosymbolic framework that treats LLM reasoning as an SMT theory, enabling joint textual-logical analysis of partially structured documents. The system improves accuracy on content moderation and legal reasoning tasks.
Continuous Semantic Caching
Researchers propose a theory-grounded semantic caching system that treats user queries as points in a continuous embedding space, using dynamic ε-net discretization and kernel ridge regression to cut inference costs and latency without switching overhead.
Microsoft Launches Free 'AI Agent Course' for Developers, Covers Design Patterns to Production
Microsoft has released a comprehensive, hands-on course for building AI agents, covering design patterns, RAG, tools, and multi-agent systems. It's a practical resource aimed at moving developers from theory to deployment.
ENS Paris-Saclay Publishes Full-Stack LLM Course: 7 Sessions Cover torchtitan, TorchFT, vLLM, and Agentic AI
Edouard Oyallon released a comprehensive open-access graduate course on training and deploying large-scale models. It bridges theory and production engineering using Meta's torchtitan and torchft, GitHub-hosted labs, and covers the full stack from distributed training to agentic AI.
Building ReAct Agents from Scratch: A Deep Dive into Agentic Architectures, Memory, and Guardrails
A comprehensive technical guide explains how to construct and secure AI agents using the ReAct (Reasoning + Acting) framework. This matters for retail AI leaders as autonomous agents move from theory to production, enabling complex, multi-step workflows.
SSL: Structured Skill Language Boosts Skill Discovery MRR to 0.707
Researchers propose SSL, a three-layer typed JSON representation for AI agent skills, replacing unstructured SKILL.md prose. Using an LLM normalizer, SSL improves Skill Discovery MRR from 0.573 to 0.707 and Risk Assessment macro F1 from 0.744 to 0.787 on a newly released 6,184-skill corpus.
ASPIRE: New Framework Makes Spectral Graph Filters Learnable for
Researchers propose ASPIRE, a bi-level optimization framework that makes spectral graph filters fully learnable for collaborative filtering, solving the 'low-frequency explosion' problem and matching task-specific designs.
ERA Framework Improves RAG Honesty by Modeling Knowledge Conflicts as
ERA replaces scalar confidence scores with explicit evidence distributions to distinguish between uncertainty and ambiguity in RAG systems, improving abstention behavior and calibration.
Moonshot AI Ships Trillion-Parameter Open Model, Matches Claude Opus on Coding
Moonshot AI released a trillion-parameter open-source model that reportedly matches Anthropic's Claude Opus on most coding benchmarks. This follows the same day Anthropic committed $25B to AWS for compute, highlighting divergent AI scaling strategies.
Anthropic Survey: 81,000 People Rank AI Economic Hopes & Fears
Anthropic published new research analyzing the economic hopes and worries expressed by 81,000 people in a prior survey on AI. The findings aim to guide AI development toward public priorities.
SemiAnalysis: NVIDIA's Customer Data Drives Disaggregated Inference, LPU Surpasses GPU
SemiAnalysis states NVIDIA's direct customer feedback is leading the industry toward disaggregated inference architectures. In this model, specialized LPUs can outperform GPUs for specific pipeline tasks.
Columbia Prof: LLMs Can't Generate New Science, Only Map Known Data
Columbia CS Professor Vishal Misra argues LLMs cannot generate new scientific ideas because they learn structured maps of known data and fail outside those boundaries. True discovery requires creating new conceptual maps, a capability current architectures lack.
Swiss AI Lab Ships Pixel-Based Agents That Control Real Phones
A Swiss AI lab has developed agents that interact with smartphones by processing screen pixels and simulating touch, eliminating the need for app-specific APIs or integrations. This approach mirrors human interaction and could generalize across any app interface.
Alibaba's DCW Fixes SNR-t Bias in Diffusion Models, Boosts FLUX & EDM
Alibaba researchers developed DCW, a wavelet-based method to correct SNR-t misalignment in diffusion models. The fix improves performance for models like FLUX and EDM with minimal computational cost.
Geoffrey Hinton: AI Breaks Historical Job Replacement Cycle
AI pioneer Geoffrey Hinton states that unlike past technological revolutions, AI can replace both physical and intellectual labor simultaneously, breaking the historical cycle of job displacement and creation.