AI Models Investigate Prehistoric Mysteries: How GPT-5.4, Claude Opus, and Gemini DeepThink Tackled the Dinosaur Civilization Question
In a fascinating experiment that blends artificial intelligence with paleontological inquiry, researcher Ethan Mollick recently challenged three leading AI models—GPT-5.4 Pro, Claude Opus, and Gemini DeepThink—with an unusual task: "Prove to me in a PowerPoint that there was no advanced dinosaur civilization by downloading whatever data you think appropriate & running tests." This seemingly whimsical prompt has revealed significant insights about how modern AI systems approach complex historical questions and conduct original research.
The Experimental Setup
The challenge presented to these AI systems wasn't merely a test of factual knowledge about dinosaur history, but rather an examination of how these models can structure arguments, gather evidence, and present conclusions in a coherent format. According to Mollick's reporting on the experiment, each model was asked to create a PowerPoint presentation that would systematically address the question of whether advanced dinosaur civilizations could have existed.
What makes this experiment particularly noteworthy is the instruction to "download whatever data you think appropriate & running tests." This directive pushes beyond simple information retrieval into the realm of original analysis—asking the AI to determine what evidence would be relevant, how to obtain it, and what analytical methods would be appropriate for evaluating that evidence.
Divergent Approaches and Capabilities
Mollick's findings indicate that GPT-5.4 and Claude Opus both engaged in what he describes as "original analyses" of the question. Rather than simply regurgitating established scientific consensus, these models appeared to structure logical arguments about what evidence would be necessary to prove or disprove the existence of an advanced dinosaur civilization.
Their approaches likely included considerations of:
- Geological evidence and fossil records
- Archaeological markers of civilization (tools, structures, artifacts)
- Chemical and isotopic signatures in rock layers
- Comparative analysis with known civilizations
- Statistical evaluation of absence of evidence versus evidence of absence
The most intriguing development, however, came from Gemini DeepThink, for which "someone build a harness." This suggests that researchers have developed specialized tools or frameworks to enhance DeepThink's analytical capabilities, potentially allowing for more sophisticated data gathering, hypothesis testing, or presentation generation than would be possible through standard interfaces.
Implications for AI Research Methodology
This experiment represents more than just an amusing thought exercise—it demonstrates several important developments in AI capabilities:
1. Complex Problem Structuring: The models weren't just answering a question but designing an entire research methodology. This indicates progress toward AI systems that can plan and execute multi-step analytical processes rather than simply responding to direct queries.
2. Evidence Evaluation: By asking the AI to determine what data would be appropriate, the experiment tests how well these systems understand evidentiary standards and can distinguish between relevant and irrelevant information.
3. Scientific Argumentation: Creating a persuasive PowerPoint requires not just facts but logical structure, appropriate visualizations, and coherent narrative—skills that extend beyond simple information retrieval.
4. Interdisciplinary Thinking: The question inherently bridges paleontology, archaeology, geology, and even speculative history, requiring the AI to integrate knowledge across domains.
The Broader Context of AI in Scientific Inquiry
This experiment occurs against a backdrop of increasing AI involvement in scientific research. Recent years have seen AI systems:
- Predicting protein structures with AlphaFold
- Discovering new materials through computational screening
- Analyzing astronomical data for exoplanet detection
- Assisting with literature reviews and hypothesis generation
What makes Mollick's experiment distinctive is its focus on historical rather than experimental sciences, and its emphasis on argument construction rather than pure data analysis. The ability to create persuasive presentations based on evidence evaluation represents a different type of intellectual labor than pattern recognition in large datasets.
Technical Considerations and Limitations
While the results are impressive, several important questions remain about what exactly these AI systems accomplished:
Data Access Limitations: When instructed to "download whatever data you think appropriate," what actual data sources were available to these models? Were they limited to publicly available datasets, or did they simulate data gathering?
Analytical Depth: How sophisticated were the "tests" these models proposed? Did they suggest statistical analyses, comparative studies, or experimental approaches that go beyond surface-level examination?
Originality vs. Recombination: To what extent were the models' outputs truly original analyses versus clever recombinations of existing arguments about dinosaur civilization hypotheses?
Evaluation Framework: Without seeing the actual PowerPoint outputs, it's difficult to assess the quality of the arguments presented. What would constitute a "good" proof that no advanced dinosaur civilization existed?
Future Directions and Applications
The techniques demonstrated in this experiment have potential applications far beyond prehistoric mysteries:
Educational Tools: AI systems that can structure arguments and gather evidence could become powerful teaching assistants for critical thinking and research methodology courses.
Scientific Review: Similar approaches could help researchers structure literature reviews, identify gaps in existing evidence, or design experiments to test competing hypotheses.
Policy Analysis: The ability to gather relevant data and structure persuasive arguments could assist in policy development and evaluation across numerous domains.
Historical Research: While the dinosaur question is speculative, similar methodologies could be applied to genuine historical controversies where evidence must be evaluated and arguments constructed.
Ethical and Epistemological Considerations
As AI systems become more capable of constructing arguments and evaluating evidence, important questions emerge:
Transparency: How can we ensure that AI-generated arguments clearly indicate their sources and limitations?
Bias Propagation: Could AI systems inadvertently reinforce existing biases in how they select and evaluate evidence?
Epistemic Authority: What happens when AI systems produce persuasive but incorrect arguments? How do we maintain appropriate skepticism toward AI-generated conclusions?
Human-AI Collaboration: The most promising applications likely involve humans and AI working together—with humans providing domain expertise, ethical judgment, and critical perspective while AI assists with data gathering, analysis, and presentation.
Conclusion: Beyond the Dinosaur Question
Ethan Mollick's experiment with GPT-5.4 Pro, Claude Opus, and Gemini DeepThink represents more than just an entertaining test case. It demonstrates significant progress in AI's ability to structure complex arguments, determine relevant evidence, and present conclusions in organized formats. While the specific question about dinosaur civilizations may be speculative, the underlying capabilities have serious implications for research, education, and decision-making across numerous fields.
The development of specialized tools like the "harness" for Gemini DeepThink suggests that we're moving toward more customized AI systems designed for specific types of intellectual work. As these capabilities continue to develop, we'll need to thoughtfully consider how to integrate them into our knowledge-producing institutions while maintaining appropriate standards of evidence, transparency, and critical evaluation.
Source: Ethan Mollick's experiment as reported on X/Twitter (@emollick)





