theorem proving

17 articles about theorem proving in AI news

The Power of Simplicity: How Minimalist AI Agents Are Revolutionizing Automated Theorem Proving

New research challenges the prevailing wisdom that complex AI systems are necessary for sophisticated tasks like automated theorem proving. A deliberately minimalist agent architecture demonstrates that streamlined approaches can achieve competitive performance while improving reproducibility and efficiency.

Mar 2, 202685% relevant

OpenAI Internal Model Reportedly Solves Three New Erdős Problems, Marking AI Advance in Pure Mathematics

An internal AI model at OpenAI has reportedly solved three previously unsolved mathematical problems from the Erdős collection. This development signals a potential leap in AI's capacity for abstract reasoning and formal theorem proving.

Apr 1, 202685% relevant

Stepwise Neuro-Symbolic Framework Proves 77.6% of seL4 Theorems, Surpassing LLM-Only Approaches

Researchers introduced Stepwise, a neuro-symbolic framework that automates proof search for systems verification. It combines fine-tuned LLMs with Isabelle REPL tools to prove 77.6% of seL4 theorems, significantly outperforming previous methods.

Mar 23, 202687% relevant

ChatGPT-5.2 Proves Mathematical Conjecture in Groundbreaking 'Vibe-Proving' Case Study

Researchers demonstrate ChatGPT-5.2 (Thinking) successfully resolving a mathematical conjecture about spectral regions through iterative 'vibe-proving' workflows. The case study reveals where AI assistance proves most valuable in research mathematics and where human expertise remains irreplaceable.

Feb 24, 202670% relevant

LLM-as-a-Judge Framework Fixes Math Evaluation Failures

Researchers propose an LLM-as-a-judge framework for evaluating math reasoning that beats rule-based symbolic comparison, fixing failures in Lighteval and SimpleRL. This enables more accurate benchmarking of LLM math abilities.

Apr 27, 202682% relevant

UniRec: A New Generative Recommendation Model Bridges the 'Expressive Gap'

A new paper introduces UniRec, a generative recommendation model that closes the performance gap with traditional discriminative models by prefixing item sequences with structured attributes like category and brand. It achieved a +22.6% improvement in offline metrics and significant online gains in CTR and GMV when deployed on Shopee.

Apr 22, 202694% relevant

OpenAI Solves Five Erdős Problems with Internal AI Model

OpenAI researchers have reportedly solved five additional unsolved Erdős problems using an internal AI model. This demonstrates significant progress in AI's ability to tackle complex, open-ended mathematical reasoning.

Apr 9, 202695% relevant

Trace2Skill Framework Distills Execution Traces into Declarative Skills via Parallel Sub-Agents

Researchers introduced Trace2Skill, a framework that uses parallel sub-agents to analyze execution trajectories and distill them into transferable declarative skills. This enables performance improvements in larger models without parameter updates.

Mar 30, 202685% relevant

GPT-5.4 Pro Reportedly Solves Open Problem in FrontierMath, With Human Verification

Researchers Kevin Barreto and Liam Price used GPT-5.4 Pro to produce a construction for an open problem in FrontierMath, which mathematician Will Brian confirmed. A formal write-up is planned for publication.

Mar 23, 202685% relevant

Learning to Disprove: LLMs Fine-Tuned for Formal Counterexample Generation in Lean 4

Researchers propose a method to train LLMs for formal counterexample generation, a neglected skill in mathematical AI. Their symbolic mutation strategy and multi-reward framework improve performance on three new benchmarks.

Mar 23, 202677% relevant

Terence Tao on AI's Impact: 'The Way We Do Everything, Including Mathematics, Will Change'

Fields Medalist Terence Tao states we are entering an unpredictable era where AI will fundamentally change how we do everything, including mathematics. He expressed a personal preference for a more stable, 'boring' period of continuity.

Mar 21, 202685% relevant

The Coming Revolution in AI Training: How Distributed Bounty Systems Will Unlock Next-Generation Models

AI development faces a bottleneck: specialized training environments built by small teams can't scale. A shift to distributed bounty systems, crowdsourcing expertise globally, promises to slash costs and accelerate progress across all advanced fields.

Mar 14, 202685% relevant

Mathematics Enters New Era as AI Generates Novel Proofs, Says Fields Medalist Terence Tao

Fields Medalist Terence Tao reveals AI is now producing unique mathematical proofs, though verification remains a bottleneck. He argues that to fully leverage AI, mathematicians must design problems that are easily checkable by both humans and machines.

Mar 11, 202685% relevant

AI Breakthrough: Large Language Models Now Solving Complex Mathematical Proofs

Researchers have developed a neuro-symbolic system that combines LLMs with traditional constraint solvers to tackle inductive definitions—a notoriously difficult class of mathematical problems. Their approach improves solver performance by approximately 25% on proof tasks involving abstract data types and recurrence relations.

Mar 9, 202675% relevant

The Benchmark Race: AI's Mathematical Prowess Now Outpacing Our Ability to Measure It

AI systems are advancing in mathematical reasoning at such an unprecedented rate that researchers are struggling to create benchmarks fast enough to properly evaluate their capabilities. This acceleration signals a fundamental shift in how we measure and understand artificial intelligence development.

Feb 26, 202685% relevant

AI Agents Now Design Their Own Training Data: The Breakthrough in Self-Evolving Logic Systems

Researchers have developed SSLogic, an agentic meta-synthesis framework that enables AI systems to autonomously create and refine their own logic reasoning training data through a continuous generate-validate-repair loop, achieving significant performance improvements across multiple benchmarks.

Feb 17, 202675% relevant

Bridging Human Language and Machine Logic: New AI Framework Achieves Near-Perfect Translation Accuracy

Researchers have developed NL2LOGIC, an AI framework that translates natural language into formal logic with 99% syntactic accuracy. By using abstract syntax trees as an intermediate representation, the system dramatically improves semantic correctness and downstream reasoning performance.

Feb 17, 202670% relevant

Explore More

AI Agents Large Language Models Claude Code OpenAI RAG MCP Fine-tuning Benchmarks Open Source AI AI Safety