number theory
30 articles about number theory in AI news
Palantir's Alex Karp Weaponizes Critical Theory to Sell AI Ontology
A critique argues Palantir CEO Alex Karp deliberately misapplies Frankfurt School critical theory to market his company's AI platforms to governments, turning philosophical critique into a sales tool for surveillance technology.
When AI Agents Need to Read Minds: The Complex Reality of Theory of Mind in Multi-LLM Systems
New research reveals that adding Theory of Mind capabilities to multi-agent AI systems doesn't guarantee better coordination. The effectiveness depends on underlying LLM capabilities, creating complex interdependencies in collaborative decision-making.
Researchers Apply Distributed Systems Theory to LLM Teams, Revealing O(n²) Communication Bottlenecks
A new paper applies decades-old distributed computing principles to LLM multi-agent systems, finding identical coordination problems: O(n²) communication bottlenecks, straggler delays, and consistency conflicts.
Agent Psychometrics: New Framework Predicts Task-Level Success in Agentic Coding Benchmarks with 0.81 AUC
A new research paper introduces a framework using Item Response Theory and task features to predict success on individual agentic coding tasks, achieving 0.81 AUC. This enables benchmark designers to calibrate difficulty without expensive evaluations.
Bridging the Gap: New RL Method Delivers Stability Guarantees with Finite Data
Researchers have developed a novel reinforcement learning approach that provides probabilistic stability guarantees using only finite data samples. The method leverages Lyapunov stability theory to ensure control systems remain stable during learning, addressing a critical challenge in deploying RL for real-world applications.
Continuous Semantic Caching
Researchers propose a theory-grounded semantic caching system that treats user queries as points in a continuous embedding space, using dynamic ε-net discretization and kernel ridge regression to cut inference costs and latency without switching overhead.
OpenAI Model Disproves Erdős Conjecture, First AI to Solve Open Math Problem
OpenAI reasoning model disproves 1946 Erdős conjecture, first AI to solve open math problem. Cross-domain proof verified by Gowers.
Agent4POI: LLM Agents Beat Static Embeddings by 23.2% on POI Rec
Agent4POI achieves 23.2% relative gain over baselines by generating context-aware POI representations at inference time, proving static embeddings insufficient.
ERA Framework Improves RAG Honesty by Modeling Knowledge Conflicts as
ERA replaces scalar confidence scores with explicit evidence distributions to distinguish between uncertainty and ambiguity in RAG systems, improving abstention behavior and calibration.
Moonshot AI Ships Trillion-Parameter Open Model, Matches Claude Opus on Coding
Moonshot AI released a trillion-parameter open-source model that reportedly matches Anthropic's Claude Opus on most coding benchmarks. This follows the same day Anthropic committed $25B to AWS for compute, highlighting divergent AI scaling strategies.
Geoffrey Hinton: AI Breaks Historical Job Replacement Cycle
AI pioneer Geoffrey Hinton states that unlike past technological revolutions, AI can replace both physical and intellectual labor simultaneously, breaking the historical cycle of job displacement and creation.
AI System Re-Identifies 67% of Anonymous Users from Text for $4 Each
Researchers combined GPT-5.2, Gemini, and Grok 4.1 Fast to create an automated attack that links anonymous social media accounts to real identities with 67% accuracy at 90% precision, costing just $1-4 per identification.
GPT-5.4 Pro Solves 60-Year-Old Erdős Problem #1196, Finds 'Book Proof'
OpenAI's GPT-5.4 Pro solved Erdős Problem #1196, a 60-year-old conjecture on primitive sets, in ~80 minutes. The AI discovered a purely analytic proof using von Mangoldt weights, rejecting the standard probabilistic approach used by mathematicians since 1935.
Entropy-Guided Branching Boosts Agent Success 15% on New SLATE E-commerce
A new paper introduces SLATE, a large-scale benchmark for evaluating tool-using AI agents, and Entropy-Guided Branching (EGB), an algorithm that improves task success rates by 15% by dynamically expanding search where the model is uncertain.
AI Engineer Gurisingh Turns Ed Thorp's Trading System into 10 ChatGPT Prompts
AI engineer Gurisingh has distilled the quantitative, probabilistic trading system of Ed Thorp—who beat blackjack and ran a 29-year winning hedge fund—into 10 actionable prompts for AI agents.
Anthropic Reportedly Deploys AI Model for Zero-Day Vulnerability Discovery
Anthropic has reportedly deployed a frontier AI model for discovering zero-day software vulnerabilities. The model is claimed to have found flaws in code audited by humans for decades.
OpenAI Solves Five Erdős Problems with Internal AI Model
OpenAI researchers have reportedly solved five additional unsolved Erdős problems using an internal AI model. This demonstrates significant progress in AI's ability to tackle complex, open-ended mathematical reasoning.
arXiv Paper Proposes Federated Multi-Agent System with AI Critics for Network Fault Analysis
A new arXiv paper introduces a collaborative control algorithm for AI agents and critics in a federated multi-agent system, providing convergence guarantees and applying it to network telemetry fault detection. The system maintains agent privacy and scales with O(m) communication overhead for m modalities.
UniMixer: A Unified Architecture for Scaling Laws in Recommendation Systems
A new arXiv paper introduces UniMixer, a unified scaling architecture for recommender systems. It bridges attention-based, TokenMixer-based, and factorization-machine-based methods into a single theoretical framework, aiming to improve parameter efficiency and scaling return on investment (ROI).
arXiv Paper Proposes 'Connections' Word Game as New Benchmark for AI Agent Social Intelligence
A new arXiv preprint introduces the improvisational word game 'Connections' as a benchmark for evaluating social intelligence in AI agents. It requires agents to gauge the cognitive states of others, testing collaborative reasoning beyond individual knowledge retrieval.
TPC-CMA Framework Reduces CLIP Modality Gap by 82.3%, Boosts Captioning CIDEr by 57.1%
Researchers propose TPC-CMA, a three-phase fine-tuning curriculum that reduces the modality gap in CLIP-like models by 82.3%, improving clustering ARI from 0.318 to 0.516 and captioning CIDEr by 57.1%.
DeepMind Secretly Assembled ~20-Person Team to Train AI for High-Frequency Trading, Aiming at Renaissance
Demis Hassabis formed a covert ~20-researcher team within DeepMind to develop AI-powered high-frequency trading algorithms, reportedly targeting rival Renaissance Technologies. Google leadership disapproved, leading to the project's quiet termination.
OpenAI Internal Model Reportedly Solves Three New Erdős Problems, Marking AI Advance in Pure Mathematics
An internal AI model at OpenAI has reportedly solved three previously unsolved mathematical problems from the Erdős collection. This development signals a potential leap in AI's capacity for abstract reasoning and formal theorem proving.
Morgan Stanley Predicts 10x Compute Spike to Double AI Intelligence, Highlights 18 GW Energy Crisis
Morgan Stanley forecasts a massive AI leap from a 10x increase in training compute, but warns of an 18-gigawatt U.S. power shortfall by 2028. The report claims GPT-5.4 matches human experts with 83% on GDPVal.
OpenCSF: A 1.5TB Free Computer Science Library Emerges from Unstructured Web Data
A new open-source dataset called OpenCSF has been compiled, containing 1.5TB of computer science materials scraped from public web sources. It provides a massive, free corpus for AI training and research in software engineering and CS education.
AgentComm-Bench Exposes Catastrophic Failure Modes in Cooperative Embodied AI Under Real-World Network Conditions
Researchers introduce AgentComm-Bench, a benchmark that stress-tests multi-agent embodied AI systems under six real-world network impairments. It reveals performance drops of over 96% in navigation and 85% in perception F1, highlighting a critical gap between lab evaluations and deployable systems.
Terence Tao on AI and Mathematics: Collaboration, Not Replacement, for Problems Like Riemann Hypothesis
Fields Medalist Terence Tao suggests AI may not solve problems like the Riemann Hypothesis alone, but through a 'collaboration we can't yet imagine' blending AI power with human insight.
Terence Tao: AI's 'Brute-Test' Approach to Math Research Could Narrow Human Efficiency Gap
Mathematician Terence Tao observes AI can synthesize millions of papers and brute-force test ideas, while humans rely on pattern recognition from few examples. He suggests the gap may narrow as AI systems develop world models, causal reasoning, and active learning.
New Research Reveals Fundamental Limitations of Vector Embeddings for Retrieval
A new theoretical paper demonstrates that embedding-based retrieval systems have inherent limitations in representing complex relevance relationships, even with simple queries. This challenges the assumption that better training data alone can solve all retrieval problems.
Terence Tao Reveals AI's Mathematical Breakthroughs: Unique Proofs Emerge from Machine Intelligence
Fields Medalist Terence Tao reports that AI systems are now generating unique mathematical proofs that human mathematicians find genuinely novel and interesting, marking a significant milestone in AI's intellectual capabilities.