theory

30 articles about theory in AI news

Palantir's Alex Karp Weaponizes Critical Theory to Sell AI Ontology

A critique argues Palantir CEO Alex Karp deliberately misapplies Frankfurt School critical theory to market his company's AI platforms to governments, turning philosophical critique into a sales tool for surveillance technology.

Apr 19, 202685% relevant

Exploration Space Theory: A Formal Framework for Prerequisite-Aware Recommendation Systems

Researchers propose Exploration Space Theory (EST), a lattice-theoretic framework for modeling prerequisite dependencies in location-based recommendations. It provides structural guarantees and validity certificates for next-step suggestions, with potential applications beyond tourism.

Mar 10, 202695% relevant

When AI Agents Need to Read Minds: The Complex Reality of Theory of Mind in Multi-LLM Systems

New research reveals that adding Theory of Mind capabilities to multi-agent AI systems doesn't guarantee better coordination. The effectiveness depends on underlying LLM capabilities, creating complex interdependencies in collaborative decision-making.

Mar 3, 202685% relevant

Game Theory Exposes Critical Gaps in AI Safety: New Benchmark Reveals Multi-Agent Risks

Researchers have developed GT-HarmBench, a groundbreaking benchmark testing AI safety through game theory. The study reveals frontier models choose socially beneficial actions only 62% of time in multi-agent scenarios, highlighting significant coordination risks.

Feb 12, 202675% relevant

FiMMIA Paper Exposes Broken MIA Benchmarks, Challenges Hessian Theory

A paper accepted at EACL 2026 shows membership inference attack (MIA) benchmarks suffer from data leakage, allowing model-free classifiers to achieve up to 99.9% AUC. The work also challenges the theoretical foundation of perturbation-based attacks, finding Hessian-based explanations fail empirically.

Apr 18, 202684% relevant

Ethan Mollick: AI Bottleneck Theory Explains Sudden Capability Jumps

Wharton professor Ethan Mollick posits that incremental AI improvements can cause sudden, large jumps in practical ability when they remove a critical bottleneck in a workflow. This explains why progress often appears non-linear.

Apr 14, 202685% relevant

Researchers Apply Distributed Systems Theory to LLM Teams, Revealing O(n²) Communication Bottlenecks

A new paper applies decades-old distributed computing principles to LLM multi-agent systems, finding identical coordination problems: O(n²) communication bottlenecks, straggler delays, and consistency conflicts.

Mar 15, 202685% relevant

Demis Hassabis Proposes 'Einstein Test' as AGI Benchmark

Demis Hassabis has proposed a novel benchmark for AGI: a model trained only on human knowledge up to 1911 must independently derive Einstein's theory of general relativity. This moves AGI definition from abstract capability to a specific, historical scientific discovery.

Apr 19, 202687% relevant

Agent Psychometrics: New Framework Predicts Task-Level Success in Agentic Coding Benchmarks with 0.81 AUC

A new research paper introduces a framework using Item Response Theory and task features to predict success on individual agentic coding tasks, achieving 0.81 AUC. This enables benchmark designers to calibrate difficulty without expensive evaluations.

Apr 2, 202675% relevant

GitHub Repository 'Math Textbooks' Aggregates Hundreds of Free University-Level Math Texts

An unmaintained GitHub repository has compiled links to hundreds of free, legally-hosted math textbooks from universities like MIT, Harvard, and Stanford. The collection spans from undergraduate calculus to graduate-level quantum field theory.

Mar 20, 202685% relevant

Terence Tao: LLM Math is Simple Undergraduate Linear Algebra, But Why They Work Remains a Mystery

Fields Medalist Terence Tao explains that the mathematics to build and run LLMs is straightforward linear algebra. The real puzzle is why they perform unpredictably across tasks, a gap in theory for 'meso-scale' natural data.

Mar 15, 202685% relevant

New Research Proposes 'Level-2 Inverse Games' to Infer Agents' Conflicting Beliefs About Each Other

MIT researchers propose a 'level-2' inverse game theory framework to infer what each agent believes about other agents' objectives, addressing limitations of current methods that assume perfect knowledge. This has implications for modeling complex multi-agent interactions.

Mar 12, 202675% relevant

OrbEvo: How AI is Revolutionizing Quantum Chemistry Simulations

Researchers have developed OrbEvo, an equivariant graph transformer that predicts quantum wavefunction evolution in molecules, potentially accelerating time-dependent density functional theory simulations by orders of magnitude. The system accurately captures excited state dynamics and optical properties while maintaining physical symmetries.

Mar 5, 202680% relevant

Bridging the Gap: New RL Method Delivers Stability Guarantees with Finite Data

Researchers have developed a novel reinforcement learning approach that provides probabilistic stability guarantees using only finite data samples. The method leverages Lyapunov stability theory to ensure control systems remain stable during learning, addressing a critical challenge in deploying RL for real-world applications.

Mar 3, 202675% relevant

The Human Bottleneck: Why AI Can't Outgrow Our Limitations

New research reveals that persistent errors in AI systems stem not from insufficient scale, but from fundamental limitations in human supervision itself. The study presents a unified theory showing human feedback creates an inescapable 'error floor' that scaling alone cannot overcome.

Mar 2, 202675% relevant

Logitext Bridges the Gap Between Language Models and Logical Reasoning

Researchers introduce Logitext, a neurosymbolic framework that treats LLM reasoning as an SMT theory, enabling joint textual-logical analysis of partially structured documents. The system improves accuracy on content moderation and legal reasoning tasks.

Feb 23, 202670% relevant

Continuous Semantic Caching

Researchers propose a theory-grounded semantic caching system that treats user queries as points in a continuous embedding space, using dynamic ε-net discretization and kernel ridge regression to cut inference costs and latency without switching overhead.

Apr 24, 202678% relevant

Microsoft Launches Free 'AI Agent Course' for Developers, Covers Design Patterns to Production

Microsoft has released a comprehensive, hands-on course for building AI agents, covering design patterns, RAG, tools, and multi-agent systems. It's a practical resource aimed at moving developers from theory to deployment.

Mar 31, 202685% relevant

ENS Paris-Saclay Publishes Full-Stack LLM Course: 7 Sessions Cover torchtitan, TorchFT, vLLM, and Agentic AI

Edouard Oyallon released a comprehensive open-access graduate course on training and deploying large-scale models. It bridges theory and production engineering using Meta's torchtitan and torchft, GitHub-hosted labs, and covers the full stack from distributed training to agentic AI.

Mar 27, 202665% relevant

Building ReAct Agents from Scratch: A Deep Dive into Agentic Architectures, Memory, and Guardrails

A comprehensive technical guide explains how to construct and secure AI agents using the ReAct (Reasoning + Acting) framework. This matters for retail AI leaders as autonomous agents move from theory to production, enabling complex, multi-step workflows.

Mar 17, 202676% relevant

SSL: Structured Skill Language Boosts Skill Discovery MRR to 0.707

Researchers propose SSL, a three-layer typed JSON representation for AI agent skills, replacing unstructured SKILL.md prose. Using an LLM normalizer, SSL improves Skill Discovery MRR from 0.573 to 0.707 and Risk Assessment macro F1 from 0.744 to 0.787 on a newly released 6,184-skill corpus.

Apr 28, 202682% relevant

ASPIRE: New Framework Makes Spectral Graph Filters Learnable for

Researchers propose ASPIRE, a bi-level optimization framework that makes spectral graph filters fully learnable for collaborative filtering, solving the 'low-frequency explosion' problem and matching task-specific designs.

Apr 27, 202690% relevant

ERA Framework Improves RAG Honesty by Modeling Knowledge Conflicts as

ERA replaces scalar confidence scores with explicit evidence distributions to distinguish between uncertainty and ambiguity in RAG systems, improving abstention behavior and calibration.

Apr 24, 202688% relevant

Moonshot AI Ships Trillion-Parameter Open Model, Matches Claude Opus on Coding

Moonshot AI released a trillion-parameter open-source model that reportedly matches Anthropic's Claude Opus on most coding benchmarks. This follows the same day Anthropic committed $25B to AWS for compute, highlighting divergent AI scaling strategies.

Apr 22, 2026100% relevant

Anthropic Survey: 81,000 People Rank AI Economic Hopes & Fears

Anthropic published new research analyzing the economic hopes and worries expressed by 81,000 people in a prior survey on AI. The findings aim to guide AI development toward public priorities.

Apr 22, 202685% relevant

SemiAnalysis: NVIDIA's Customer Data Drives Disaggregated Inference, LPU Surpasses GPU

SemiAnalysis states NVIDIA's direct customer feedback is leading the industry toward disaggregated inference architectures. In this model, specialized LPUs can outperform GPUs for specific pipeline tasks.

Apr 22, 202685% relevant

Columbia Prof: LLMs Can't Generate New Science, Only Map Known Data

Columbia CS Professor Vishal Misra argues LLMs cannot generate new scientific ideas because they learn structured maps of known data and fail outside those boundaries. True discovery requires creating new conceptual maps, a capability current architectures lack.

Apr 21, 202687% relevant

Swiss AI Lab Ships Pixel-Based Agents That Control Real Phones

A Swiss AI lab has developed agents that interact with smartphones by processing screen pixels and simulating touch, eliminating the need for app-specific APIs or integrations. This approach mirrors human interaction and could generalize across any app interface.

Apr 21, 202693% relevant

Alibaba's DCW Fixes SNR-t Bias in Diffusion Models, Boosts FLUX & EDM

Alibaba researchers developed DCW, a wavelet-based method to correct SNR-t misalignment in diffusion models. The fix improves performance for models like FLUX and EDM with minimal computational cost.

Apr 20, 202685% relevant

Geoffrey Hinton: AI Breaks Historical Job Replacement Cycle

AI pioneer Geoffrey Hinton states that unlike past technological revolutions, AI can replace both physical and intellectual labor simultaneously, breaking the historical cycle of job displacement and creation.

Apr 20, 202685% relevant

Explore More

AI Agents Large Language Models Claude Code OpenAI RAG MCP Fine-tuning Benchmarks Open Source AI AI Safety