number theory

30 articles about number theory in AI news

Palantir's Alex Karp Weaponizes Critical Theory to Sell AI Ontology

A critique argues Palantir CEO Alex Karp deliberately misapplies Frankfurt School critical theory to market his company's AI platforms to governments, turning philosophical critique into a sales tool for surveillance technology.

Apr 19, 202685% relevant

When AI Agents Need to Read Minds: The Complex Reality of Theory of Mind in Multi-LLM Systems

New research reveals that adding Theory of Mind capabilities to multi-agent AI systems doesn't guarantee better coordination. The effectiveness depends on underlying LLM capabilities, creating complex interdependencies in collaborative decision-making.

Mar 3, 202685% relevant

Researchers Apply Distributed Systems Theory to LLM Teams, Revealing O(n²) Communication Bottlenecks

A new paper applies decades-old distributed computing principles to LLM multi-agent systems, finding identical coordination problems: O(n²) communication bottlenecks, straggler delays, and consistency conflicts.

Mar 15, 202685% relevant

Figure robot count surpasses human headcount for first time

Figure's robot count surpassed its human headcount for the first time, signaling a shift from R&D to deployment. Exact numbers were not disclosed.

Jun 20, 202675% relevant

Agent Psychometrics: New Framework Predicts Task-Level Success in Agentic Coding Benchmarks with 0.81 AUC

A new research paper introduces a framework using Item Response Theory and task features to predict success on individual agentic coding tasks, achieving 0.81 AUC. This enables benchmark designers to calibrate difficulty without expensive evaluations.

Apr 2, 202675% relevant

Bridging the Gap: New RL Method Delivers Stability Guarantees with Finite Data

Researchers have developed a novel reinforcement learning approach that provides probabilistic stability guarantees using only finite data samples. The method leverages Lyapunov stability theory to ensure control systems remain stable during learning, addressing a critical challenge in deploying RL for real-world applications.

Mar 3, 202675% relevant

Use MCP Inspector to Build an AI Agent Messaging Workflow

MCP Inspector lets Claude Code users replace hardcoded REST endpoints with a Discover→Plan→Execute→Observe workflow for SMS delivery—no theory, just a live BridgeXAPI server demo.

Jul 2, 202675% relevant

Continuous Semantic Caching

Researchers propose a theory-grounded semantic caching system that treats user queries as points in a continuous embedding space, using dynamic ε-net discretization and kernel ridge regression to cut inference costs and latency without switching overhead.

Apr 24, 202678% relevant

CVEs spike 3.5x after Anthropic's Mythos Preview launch

High-severity CVEs jumped 3.5x in June after Anthropic's Mythos Preview launch. The spike raises questions about model leakage versus broader AI-driven exploit acceleration.

Jul 3, 2026100% relevant

SciCode: Epoch AI Launches Benchmark Measuring AI Research Ability

Epoch AI launched SciCode benchmark testing LLMs on real research coding tasks. Top models score below 30%, exposing gap between coding benchmarks and scientific ability.

Jun 27, 202695% relevant

MA-ProofBench: GPT-5.5 Hits 16% on Math Analysis, Most Models Near 0%

MA-ProofBench, a new theorem-proving benchmark for mathematical analysis, shows GPT-5.5 achieving 16% on undergraduate problems and 5% on PhD-level, with most models near 0% on the harder set.

Jun 15, 202682% relevant

OpenAI Model Disproves Erdős Conjecture, First AI to Solve Open Math Problem

OpenAI reasoning model disproves 1946 Erdős conjecture, first AI to solve open math problem. Cross-domain proof verified by Gowers.

May 21, 202697% relevant

Agent4POI: LLM Agents Beat Static Embeddings by 23.2% on POI Rec

Agent4POI achieves 23.2% relative gain over baselines by generating context-aware POI representations at inference time, proving static embeddings insufficient.

May 18, 202676% relevant

ERA Framework Improves RAG Honesty by Modeling Knowledge Conflicts as

ERA replaces scalar confidence scores with explicit evidence distributions to distinguish between uncertainty and ambiguity in RAG systems, improving abstention behavior and calibration.

Apr 24, 202688% relevant

Moonshot AI Ships Trillion-Parameter Open Model, Matches Claude Opus on Coding

Moonshot AI released a trillion-parameter open-source model that reportedly matches Anthropic's Claude Opus on most coding benchmarks. This follows the same day Anthropic committed $25B to AWS for compute, highlighting divergent AI scaling strategies.

Apr 22, 2026100% relevant

Geoffrey Hinton: AI Breaks Historical Job Replacement Cycle

AI pioneer Geoffrey Hinton states that unlike past technological revolutions, AI can replace both physical and intellectual labor simultaneously, breaking the historical cycle of job displacement and creation.

Apr 20, 202685% relevant

AI System Re-Identifies 67% of Anonymous Users from Text for $4 Each

Researchers combined GPT-5.2, Gemini, and Grok 4.1 Fast to create an automated attack that links anonymous social media accounts to real identities with 67% accuracy at 90% precision, costing just $1-4 per identification.

Apr 15, 202695% relevant

GPT-5.4 Pro Solves 60-Year-Old Erdős Problem #1196, Finds 'Book Proof'

OpenAI's GPT-5.4 Pro solved Erdős Problem #1196, a 60-year-old conjecture on primitive sets, in ~80 minutes. The AI discovered a purely analytic proof using von Mangoldt weights, rejecting the standard probabilistic approach used by mathematicians since 1935.

Apr 15, 2026100% relevant

Entropy-Guided Branching Boosts Agent Success 15% on New SLATE E-commerce

A new paper introduces SLATE, a large-scale benchmark for evaluating tool-using AI agents, and Entropy-Guided Branching (EGB), an algorithm that improves task success rates by 15% by dynamically expanding search where the model is uncertain.

Apr 15, 202673% relevant

AI Engineer Gurisingh Turns Ed Thorp's Trading System into 10 ChatGPT Prompts

AI engineer Gurisingh has distilled the quantitative, probabilistic trading system of Ed Thorp—who beat blackjack and ran a 29-year winning hedge fund—into 10 actionable prompts for AI agents.

Apr 12, 202687% relevant

Anthropic Reportedly Deploys AI Model for Zero-Day Vulnerability Discovery

Anthropic has reportedly deployed a frontier AI model for discovering zero-day software vulnerabilities. The model is claimed to have found flaws in code audited by humans for decades.

Apr 9, 202697% relevant

OpenAI Solves Five Erdős Problems with Internal AI Model

OpenAI researchers have reportedly solved five additional unsolved Erdős problems using an internal AI model. This demonstrates significant progress in AI's ability to tackle complex, open-ended mathematical reasoning.

Apr 9, 202695% relevant

arXiv Paper Proposes Federated Multi-Agent System with AI Critics for Network Fault Analysis

A new arXiv paper introduces a collaborative control algorithm for AI agents and critics in a federated multi-agent system, providing convergence guarantees and applying it to network telemetry fault detection. The system maintains agent privacy and scales with O(m) communication overhead for m modalities.

Apr 3, 202674% relevant

arXiv Paper Proposes 'Connections' Word Game as New Benchmark for AI Agent Social Intelligence

A new arXiv preprint introduces the improvisational word game 'Connections' as a benchmark for evaluating social intelligence in AI agents. It requires agents to gauge the cognitive states of others, testing collaborative reasoning beyond individual knowledge retrieval.

Apr 2, 202688% relevant

UniMixer: A Unified Architecture for Scaling Laws in Recommendation Systems

A new arXiv paper introduces UniMixer, a unified scaling architecture for recommender systems. It bridges attention-based, TokenMixer-based, and factorization-machine-based methods into a single theoretical framework, aiming to improve parameter efficiency and scaling return on investment (ROI).

Apr 2, 202696% relevant

TPC-CMA Framework Reduces CLIP Modality Gap by 82.3%, Boosts Captioning CIDEr by 57.1%

Researchers propose TPC-CMA, a three-phase fine-tuning curriculum that reduces the modality gap in CLIP-like models by 82.3%, improving clustering ARI from 0.318 to 0.516 and captioning CIDEr by 57.1%.

Apr 2, 202674% relevant

DeepMind Secretly Assembled ~20-Person Team to Train AI for High-Frequency Trading, Aiming at Renaissance

Demis Hassabis formed a covert ~20-researcher team within DeepMind to develop AI-powered high-frequency trading algorithms, reportedly targeting rival Renaissance Technologies. Google leadership disapproved, leading to the project's quiet termination.

Apr 1, 202695% relevant

OpenAI Internal Model Reportedly Solves Three New Erdős Problems, Marking AI Advance in Pure Mathematics

An internal AI model at OpenAI has reportedly solved three previously unsolved mathematical problems from the Erdős collection. This development signals a potential leap in AI's capacity for abstract reasoning and formal theorem proving.

Apr 1, 202685% relevant

Morgan Stanley Predicts 10x Compute Spike to Double AI Intelligence, Highlights 18 GW Energy Crisis

Morgan Stanley forecasts a massive AI leap from a 10x increase in training compute, but warns of an 18-gigawatt U.S. power shortfall by 2028. The report claims GPT-5.4 matches human experts with 83% on GDPVal.

Mar 26, 202697% relevant

OpenCSF: A 1.5TB Free Computer Science Library Emerges from Unstructured Web Data

A new open-source dataset called OpenCSF has been compiled, containing 1.5TB of computer science materials scraped from public web sources. It provides a massive, free corpus for AI training and research in software engineering and CS education.

Mar 24, 202685% relevant

Explore More

AI Agents Large Language Models Claude Code OpenAI RAG MCP Fine-tuning Benchmarks Open Source AI AI Safety