In a recent discussion, DeepMind co-founder and CEO Demis Hassabis proposed a novel, historically-grounded test for defining Artificial General Intelligence (AGI). The test, dubbed the "Einstein test," offers a concrete benchmark that moves beyond abstract definitions of intelligence.
Key Takeaways
- Demis Hassabis has proposed a novel benchmark for AGI: a model trained only on human knowledge up to 1911 must independently derive Einstein's theory of general relativity.
- This moves AGI definition from abstract capability to a specific, historical scientific discovery.
What is the 'Einstein Test'?

The test is conceptually straightforward but operationally profound. The proposal is to:
- Train an AI model on the entirety of human knowledge, but with a strict cutoff date of 1911. This includes all scientific literature, mathematics, and empirical data available up to that point.
- Challenge the model to independently discover the theory of general relativity. The model must reason from the knowledge base of 1911 to derive the principles that Albert Einstein formulated and published between 1915 and 1916.
According to Hassabis's framing, if an AI system can successfully complete this task—rediscovering one of the most significant scientific breakthroughs of the 20th century from its historical antecedents—it would constitute a strong demonstration of AGI.
Context: The Elusive Definition of AGI
The AI field has long struggled with a concrete, operational definition for AGI. Definitions often revolve around broad capabilities like "performing any intellectual task that a human can" or achieving human-level performance across a wide range of domains. These definitions are useful for philosophical discussion but are notoriously difficult to translate into measurable benchmarks for researchers.
Hassabis's proposal injects specificity into this debate. It anchors the test in a real-world, historical example of exceptional human creativity and deductive reasoning. The year 1911 is significant; it was a period when physics was in a state of crisis. Newtonian mechanics reigned, but anomalies like the precession of Mercury's orbit and the null result of the Michelson-Morley experiment were known. The luminiferous aether was a dominant concept. Einstein's genius was in synthesizing these known facts with profound new principles (the equivalence principle, the curvature of spacetime) to produce a revolutionary theory.
Why This Test is Demanding

The "Einstein test" is not merely a question of information retrieval or pattern matching. It requires several capabilities that current AI systems lack:
- Causal & Counterfactual Reasoning: The model must reason about physical causes and effects beyond correlation.
- Creative Synthesis: It must combine known concepts (e.g., gravity, geometry, relativity of motion) in novel ways to formulate a new theoretical framework.
- Mathematical Derivation: It must be capable of the complex mathematical formalism required to express the theory.
- Scientific Intuition: It must identify which anomalies in the 1911 knowledge base are critical and which are peripheral, guiding its search for a new theory.
Passing this test would demonstrate not just mastery of existing knowledge, but the ability to extend the frontier of knowledge itself—a hallmark of general intelligence.
gentic.news Analysis
Demis Hassabis's proposal is a significant contribution to the ongoing discourse on AGI benchmarks, coming from a leader whose organization has consistently pushed the envelope on AI capabilities. This follows DeepMind's history of defining and achieving concrete intelligence milestones, from AlphaGo mastering Go to AlphaFold solving the protein folding problem. The "Einstein test" aligns with this pattern of setting clear, audacious goals.
This proposal also intersects with recent trends in AI evaluation. There is growing dissatisfaction with static benchmarks that can be memorized or overfitted. The community is shifting towards dynamic, process-oriented evaluations that test reasoning, such as the GAIA benchmark for general AI assistants or SWE-Bench for coding agents that must edit real codebases. Hassabis's test takes this a step further into the domain of open-ended scientific discovery.
However, the test presents immense practical challenges. Creating a faithful "1911-world" knowledge base for training is a monumental task in itself, involving digitization, translation, and contextual understanding of historical scientific paradigms. Furthermore, evaluating whether a model's output constitutes a "discovery" of general relativity, as opposed to a plausible-sounding reconstruction, would require rigorous peer review by physicists—essentially subjecting the AI to the same scrutiny as a human scientist.
Practically, this test is likely decades away from being passable. Yet, as a north star, it provides a compelling and specific target for AGI research that is rooted in a celebrated human intellectual achievement. It moves the conversation from "when will we know it?" to "what must it concretely do?"
Frequently Asked Questions
What is the 'Einstein test' for AGI?
The "Einstein test," proposed by DeepMind CEO Demis Hassabis, is a proposed benchmark for Artificial General Intelligence. It involves training an AI model on all human knowledge available up to the year 1911 and then challenging it to independently discover the theory of general relativity, which Einstein published in 1915-1916. Success would indicate a level of creative scientific reasoning akin to human genius.
Why is 1911 the cutoff date for the test?
The year 1911 is strategically chosen because it precedes Einstein's final formulation of general relativity. By this date, key empirical puzzles (like Mercury's orbit) and theoretical concepts (special relativity, the equivalence principle) were known, but no one had yet synthesized them into the complete theory. It represents the "state of the art" knowledge from which a breakthrough had to be made.
How is this different from current AI benchmarks?
Most current AI benchmarks test proficiency within existing knowledge frameworks—answering questions, solving predefined puzzles, or generating text based on patterns. The Einstein test is fundamentally different: it evaluates the ability to create new knowledge that was not present in the training data, requiring leaps of intuition, causal reasoning, and theoretical synthesis.
Is any AI close to passing this test?
No current AI system is remotely close to passing the Einstein test. While large language models can describe general relativity and its history, they are recalling and recombining known information. The test requires deriving the theory ab initio from a pre-1911 worldview, a task of open-ended discovery that remains far beyond the capabilities of today's pattern-based models.








