The 1911 Test: Demis Hassabis Proposes a Novel Benchmark for True Artificial General Intelligence

DeepMind CEO Demis Hassabis suggests that a true test for AGI would be an AI trained only with pre-1911 knowledge discovering general relativity. This benchmark challenges current systems and redefines progress toward human-like reasoning.

AAAla SMITH & AI Research Desk·Feb 23, 2026·6 min read··153 views·AI-Generated·Report error

Source: twitter.comvia @kimmonismusSingle Source

In a thought-provoking statement that has sparked discussion across the AI research community, DeepMind co-founder and CEO Demis Hassabis has proposed what he calls a definitive test for true Artificial General Intelligence (AGI): Could an AI system, trained exclusively on knowledge available up to the year 1911, independently discover Einstein's theory of general relativity?

This conceptual benchmark, shared via social media and further elaborated in related discussions, cuts to the heart of what separates today's advanced but narrow AI systems from the still-hypothetical goal of AGI—machines with the flexible, creative, and general problem-solving abilities of human intelligence.

Why 1911? The Significance of the Cutoff

The year 1911 is not arbitrary. It represents a pivotal moment in scientific history, just four years before Albert Einstein published his complete field equations of general relativity in 1915. By 1911, the scientific community possessed significant pieces of the puzzle:

Newtonian mechanics and gravitation were well-established but showing cracks under extreme conditions.
Einstein had already published his special theory of relativity (1905).
The equivalence principle (the idea that gravitational mass is identical to inertial mass) was being formulated.
Observations like the anomalous precession of Mercury's orbit were known but unexplained by Newtonian physics.

An AI trained only up to this point would have access to the known contradictions and open questions but not the revolutionary synthesis that Einstein achieved. The test, therefore, is not about recalling a known fact but about engaging in genuine scientific discovery—connecting disparate dots, formulating novel hypotheses, and deriving a fundamentally new framework to explain observed phenomena.

The Limitations of Current AI Systems

Hassabis's core assertion is that "today's systems can't do it." This highlights a critical limitation of even the most advanced contemporary AI, including large language models (LLMs) like GPT-4, Claude 3, and Google's own Gemini. These systems are profoundly proficient at pattern recognition, synthesis of existing information, and generating text that mimics understanding. However, their knowledge is fundamentally retrospective and interpolative.

A modern LLM can eloquently explain general relativity because it has been trained on terabytes of text that include descriptions of the theory, its derivations, and its consequences. Its "understanding" is a statistical mapping of how humans discuss the concept. It cannot, from first principles and a constrained knowledge base, perform the leap of creative insight and mathematical derivation that Einstein did. It lacks the capacity for the kind of constrained abstraction, thought experimentation (like Einstein's famous elevator gedankenexperiment), and fundamentally new model-building that defines breakthrough science.

What Would Passing the Test Require?

For an AI to pass Hassabis's 1911 test, it would need capabilities that are hallmarks of AGI:

Deep Causal Reasoning: Moving beyond correlational patterns to build internal models of how the universe must work, identifying inconsistencies in existing theories (like the mismatch between Newtonian gravity and special relativity).
Creative Abstraction and Hypothesis Generation: Proposing entirely new foundational concepts, such as the curvature of spacetime as the manifestation of gravity, rather than as a force.
Mathematical Invention: Formally deriving the necessary tensor calculus and field equations from first principles and the new conceptual framework.
Autonomous Goal-Directed Research: Defining its own problems ("How can I reconcile these observations with this principle?") and pursuing a multi-step research pathway without human guidance.

This goes far beyond today's AI-assisted discovery tools, which help scientists explore vast literature or run simulations. It implies an AI that can act as a primary discoverer.

Implications for the Path to AGI

Hassabis's test reframes the AGI conversation from one of scale—bigger models, more data, more compute—to one of architectural and algorithmic breakthrough. It suggests that the key missing ingredient is not more knowledge, but a new form of reasoning engine.

This aligns with growing research directions in AI:

Hybrid Neuro-Symbolic Systems: Combining the pattern recognition of neural networks with the logical, rule-based reasoning of symbolic AI.
AI Scientists: Projects aimed at creating systems that can autonomously form and test scientific hypotheses, though currently in limited domains.
Reinforcement Learning for Discovery: Using reward frameworks that incentivize the finding of novel, parsimonious, and empirically accurate theories.

Hassabis's prediction that such a capability "might be possible in a few years" is characteristically optimistic but points to the accelerated pace of foundational AI research. It implies that DeepMind and others may have specific research trajectories aimed at this kind of generative reasoning.

Broader Philosophical and Practical Questions

The 1911 test raises profound questions beyond technical AI research:

The Nature of Discovery: If an AI can rediscover a human breakthrough, does it understand it in the same way? Does that matter?
The Future of Science: Passing this test would herald a new era of AI as a co-equal partner in fundamental research, potentially accelerating progress in physics, medicine, and materials science.
Benchmarking AGI: The AI community lacks consensus on definitive AGI benchmarks. Hassabis's proposal is a concrete, historically-grounded challenge that could become a standard milestone, much like the Turing Test (though far more rigorous).

Conclusion: A North Star for AGI Research

Demis Hassabis's 1911 test provides a brilliantly clear and ambitious north star for AGI research. It moves the goalpost from systems that can learn and apply anything a human knows, to systems that can know in the way humans do—through creative, constrained, and groundbreaking thought. While today's AI remains a powerful tool for amplifying human intelligence, passing this test would signal the arrival of a new kind of intelligence capable of standing on the shoulders of giants, and then seeing further.

The journey to an AI that can rediscover general relativity from a pre-1911 worldview will likely require innovations we have only begun to imagine. But in defining the challenge so precisely, Hassabis has given the field a powerful lens through which to measure true progress toward artificial general intelligence.

Source: Statement by Demis Hassabis via Twitter/X (@kimmonismus), referencing and expanding on the conceptual benchmark for AGI.

Source: gentic.news · Feb 23, 2026 · author=Ala SMITH · citation.json

AI-assisted reporting. Generated by gentic.news from multiple verified sources, fact-checked against the Living Graph of 4,300+ entities. Edited by Ala SMITH.

Following this story?

Get a weekly digest with AI predictions, trends, and analysis — free.

AI Analysis

Demis Hassabis's proposed test is significant because it operationalizes a core, yet often vague, tenet of AGI: the capacity for *de novo* discovery, not just sophisticated recall or interpolation. By anchoring the test in a specific historical epistemic state (pre-1911 physics), it creates a clean-room experiment that isolates the ability for generative reasoning from the ability to access and recombine existing knowledge of the solution. This is a direct challenge to the paradigm of scaling existing architectures, suggesting that a qualitative leap in reasoning ability is required. The implications are twofold. For the AI research community, it provides a concrete, high-stakes benchmark that could guide architecture development toward stronger forms of causal modeling, abstraction, and autonomous research. Philosophically, it forces a re-evaluation of what we mean by 'understanding.' An AI passing this test would not just be manipulating symbols that correlate with human concepts; it would be deriving those concepts from a constrained universe of evidence, arguably demonstrating a form of understanding grounded in the structure of the world itself, independent of human exposition.

#agi #machine learning #scientific ai #ai research

Mentioned in this article

Google Artificial General Intelligence Demis Hassabis

Enjoyed this article?