The 1911 Test: Demis Hassabis Proposes a Novel Benchmark for True Artificial General Intelligence
In a thought-provoking statement that has sparked discussion across the AI research community, DeepMind co-founder and CEO Demis Hassabis has proposed what he calls a definitive test for true Artificial General Intelligence (AGI): Could an AI system, trained exclusively on knowledge available up to the year 1911, independently discover Einstein's theory of general relativity?
This conceptual benchmark, shared via social media and further elaborated in related discussions, cuts to the heart of what separates today's advanced but narrow AI systems from the still-hypothetical goal of AGI—machines with the flexible, creative, and general problem-solving abilities of human intelligence.
Why 1911? The Significance of the Cutoff
The year 1911 is not arbitrary. It represents a pivotal moment in scientific history, just four years before Albert Einstein published his complete field equations of general relativity in 1915. By 1911, the scientific community possessed significant pieces of the puzzle:
- Newtonian mechanics and gravitation were well-established but showing cracks under extreme conditions.
- Einstein had already published his special theory of relativity (1905).
- The equivalence principle (the idea that gravitational mass is identical to inertial mass) was being formulated.
- Observations like the anomalous precession of Mercury's orbit were known but unexplained by Newtonian physics.
An AI trained only up to this point would have access to the known contradictions and open questions but not the revolutionary synthesis that Einstein achieved. The test, therefore, is not about recalling a known fact but about engaging in genuine scientific discovery—connecting disparate dots, formulating novel hypotheses, and deriving a fundamentally new framework to explain observed phenomena.
The Limitations of Current AI Systems
Hassabis's core assertion is that "today's systems can't do it." This highlights a critical limitation of even the most advanced contemporary AI, including large language models (LLMs) like GPT-4, Claude 3, and Google's own Gemini. These systems are profoundly proficient at pattern recognition, synthesis of existing information, and generating text that mimics understanding. However, their knowledge is fundamentally retrospective and interpolative.
A modern LLM can eloquently explain general relativity because it has been trained on terabytes of text that include descriptions of the theory, its derivations, and its consequences. Its "understanding" is a statistical mapping of how humans discuss the concept. It cannot, from first principles and a constrained knowledge base, perform the leap of creative insight and mathematical derivation that Einstein did. It lacks the capacity for the kind of constrained abstraction, thought experimentation (like Einstein's famous elevator gedankenexperiment), and fundamentally new model-building that defines breakthrough science.
What Would Passing the Test Require?
For an AI to pass Hassabis's 1911 test, it would need capabilities that are hallmarks of AGI:
- Deep Causal Reasoning: Moving beyond correlational patterns to build internal models of how the universe must work, identifying inconsistencies in existing theories (like the mismatch between Newtonian gravity and special relativity).
- Creative Abstraction and Hypothesis Generation: Proposing entirely new foundational concepts, such as the curvature of spacetime as the manifestation of gravity, rather than as a force.
- Mathematical Invention: Formally deriving the necessary tensor calculus and field equations from first principles and the new conceptual framework.
- Autonomous Goal-Directed Research: Defining its own problems ("How can I reconcile these observations with this principle?") and pursuing a multi-step research pathway without human guidance.
This goes far beyond today's AI-assisted discovery tools, which help scientists explore vast literature or run simulations. It implies an AI that can act as a primary discoverer.
Implications for the Path to AGI
Hassabis's test reframes the AGI conversation from one of scale—bigger models, more data, more compute—to one of architectural and algorithmic breakthrough. It suggests that the key missing ingredient is not more knowledge, but a new form of reasoning engine.
This aligns with growing research directions in AI:
- Hybrid Neuro-Symbolic Systems: Combining the pattern recognition of neural networks with the logical, rule-based reasoning of symbolic AI.
- AI Scientists: Projects aimed at creating systems that can autonomously form and test scientific hypotheses, though currently in limited domains.
- Reinforcement Learning for Discovery: Using reward frameworks that incentivize the finding of novel, parsimonious, and empirically accurate theories.
Hassabis's prediction that such a capability "might be possible in a few years" is characteristically optimistic but points to the accelerated pace of foundational AI research. It implies that DeepMind and others may have specific research trajectories aimed at this kind of generative reasoning.
Broader Philosophical and Practical Questions
The 1911 test raises profound questions beyond technical AI research:
- The Nature of Discovery: If an AI can rediscover a human breakthrough, does it understand it in the same way? Does that matter?
- The Future of Science: Passing this test would herald a new era of AI as a co-equal partner in fundamental research, potentially accelerating progress in physics, medicine, and materials science.
- Benchmarking AGI: The AI community lacks consensus on definitive AGI benchmarks. Hassabis's proposal is a concrete, historically-grounded challenge that could become a standard milestone, much like the Turing Test (though far more rigorous).
Conclusion: A North Star for AGI Research
Demis Hassabis's 1911 test provides a brilliantly clear and ambitious north star for AGI research. It moves the goalpost from systems that can learn and apply anything a human knows, to systems that can know in the way humans do—through creative, constrained, and groundbreaking thought. While today's AI remains a powerful tool for amplifying human intelligence, passing this test would signal the arrival of a new kind of intelligence capable of standing on the shoulders of giants, and then seeing further.
The journey to an AI that can rediscover general relativity from a pre-1911 worldview will likely require innovations we have only begun to imagine. But in defining the challenge so precisely, Hassabis has given the field a powerful lens through which to measure true progress toward artificial general intelligence.
Source: Statement by Demis Hassabis via Twitter/X (@kimmonismus), referencing and expanding on the conceptual benchmark for AGI.



