Listen to today's AI briefing

Daily podcast — 5 min, AI-narrated summary of top stories

AI Models Investigate Prehistoric Mysteries: How GPT-5.4, Claude Opus, and Gemini DeepThink Tackled the Dinosaur Civilization Question

Leading AI models including GPT-5.4 Pro, Claude Opus, and Gemini DeepThink were challenged to investigate whether advanced dinosaur civilizations existed. The experiment reveals how modern AI systems approach complex historical questions with original analysis and data gathering capabilities.

AAAla AYADI & AI Research Desk·Mar 5, 2026·6 min read··133 views·AI-Generated·Report error

Source: x.comvia @emollickSingle Source

In a fascinating experiment that blends artificial intelligence with paleontological inquiry, researcher Ethan Mollick recently challenged three leading AI models—GPT-5.4 Pro, Claude Opus, and Gemini DeepThink—with an unusual task: "Prove to me in a PowerPoint that there was no advanced dinosaur civilization by downloading whatever data you think appropriate & running tests." This seemingly whimsical prompt has revealed significant insights about how modern AI systems approach complex historical questions and conduct original research.

The Experimental Setup

The challenge presented to these AI systems wasn't merely a test of factual knowledge about dinosaur history, but rather an examination of how these models can structure arguments, gather evidence, and present conclusions in a coherent format. According to Mollick's reporting on the experiment, each model was asked to create a PowerPoint presentation that would systematically address the question of whether advanced dinosaur civilizations could have existed.

What makes this experiment particularly noteworthy is the instruction to "download whatever data you think appropriate & running tests." This directive pushes beyond simple information retrieval into the realm of original analysis—asking the AI to determine what evidence would be relevant, how to obtain it, and what analytical methods would be appropriate for evaluating that evidence.

Divergent Approaches and Capabilities

Mollick's findings indicate that GPT-5.4 and Claude Opus both engaged in what he describes as "original analyses" of the question. Rather than simply regurgitating established scientific consensus, these models appeared to structure logical arguments about what evidence would be necessary to prove or disprove the existence of an advanced dinosaur civilization.

Their approaches likely included considerations of:

Geological evidence and fossil records
Archaeological markers of civilization (tools, structures, artifacts)
Chemical and isotopic signatures in rock layers
Comparative analysis with known civilizations
Statistical evaluation of absence of evidence versus evidence of absence

The most intriguing development, however, came from Gemini DeepThink, for which "someone build a harness." This suggests that researchers have developed specialized tools or frameworks to enhance DeepThink's analytical capabilities, potentially allowing for more sophisticated data gathering, hypothesis testing, or presentation generation than would be possible through standard interfaces.

Implications for AI Research Methodology

This experiment represents more than just an amusing thought exercise—it demonstrates several important developments in AI capabilities:

1. Complex Problem Structuring: The models weren't just answering a question but designing an entire research methodology. This indicates progress toward AI systems that can plan and execute multi-step analytical processes rather than simply responding to direct queries.

2. Evidence Evaluation: By asking the AI to determine what data would be appropriate, the experiment tests how well these systems understand evidentiary standards and can distinguish between relevant and irrelevant information.

3. Scientific Argumentation: Creating a persuasive PowerPoint requires not just facts but logical structure, appropriate visualizations, and coherent narrative—skills that extend beyond simple information retrieval.

4. Interdisciplinary Thinking: The question inherently bridges paleontology, archaeology, geology, and even speculative history, requiring the AI to integrate knowledge across domains.

The Broader Context of AI in Scientific Inquiry

This experiment occurs against a backdrop of increasing AI involvement in scientific research. Recent years have seen AI systems:

Predicting protein structures with AlphaFold
Discovering new materials through computational screening
Analyzing astronomical data for exoplanet detection
Assisting with literature reviews and hypothesis generation

What makes Mollick's experiment distinctive is its focus on historical rather than experimental sciences, and its emphasis on argument construction rather than pure data analysis. The ability to create persuasive presentations based on evidence evaluation represents a different type of intellectual labor than pattern recognition in large datasets.

Technical Considerations and Limitations

While the results are impressive, several important questions remain about what exactly these AI systems accomplished:

Data Access Limitations: When instructed to "download whatever data you think appropriate," what actual data sources were available to these models? Were they limited to publicly available datasets, or did they simulate data gathering?

Analytical Depth: How sophisticated were the "tests" these models proposed? Did they suggest statistical analyses, comparative studies, or experimental approaches that go beyond surface-level examination?

Originality vs. Recombination: To what extent were the models' outputs truly original analyses versus clever recombinations of existing arguments about dinosaur civilization hypotheses?

Evaluation Framework: Without seeing the actual PowerPoint outputs, it's difficult to assess the quality of the arguments presented. What would constitute a "good" proof that no advanced dinosaur civilization existed?

Future Directions and Applications

The techniques demonstrated in this experiment have potential applications far beyond prehistoric mysteries:

Educational Tools: AI systems that can structure arguments and gather evidence could become powerful teaching assistants for critical thinking and research methodology courses.

Scientific Review: Similar approaches could help researchers structure literature reviews, identify gaps in existing evidence, or design experiments to test competing hypotheses.

Policy Analysis: The ability to gather relevant data and structure persuasive arguments could assist in policy development and evaluation across numerous domains.

Historical Research: While the dinosaur question is speculative, similar methodologies could be applied to genuine historical controversies where evidence must be evaluated and arguments constructed.

Ethical and Epistemological Considerations

As AI systems become more capable of constructing arguments and evaluating evidence, important questions emerge:

Transparency: How can we ensure that AI-generated arguments clearly indicate their sources and limitations?

Bias Propagation: Could AI systems inadvertently reinforce existing biases in how they select and evaluate evidence?

Epistemic Authority: What happens when AI systems produce persuasive but incorrect arguments? How do we maintain appropriate skepticism toward AI-generated conclusions?

Human-AI Collaboration: The most promising applications likely involve humans and AI working together—with humans providing domain expertise, ethical judgment, and critical perspective while AI assists with data gathering, analysis, and presentation.

Conclusion: Beyond the Dinosaur Question

Ethan Mollick's experiment with GPT-5.4 Pro, Claude Opus, and Gemini DeepThink represents more than just an entertaining test case. It demonstrates significant progress in AI's ability to structure complex arguments, determine relevant evidence, and present conclusions in organized formats. While the specific question about dinosaur civilizations may be speculative, the underlying capabilities have serious implications for research, education, and decision-making across numerous fields.

The development of specialized tools like the "harness" for Gemini DeepThink suggests that we're moving toward more customized AI systems designed for specific types of intellectual work. As these capabilities continue to develop, we'll need to thoughtfully consider how to integrate them into our knowledge-producing institutions while maintaining appropriate standards of evidence, transparency, and critical evaluation.

Source: Ethan Mollick's experiment as reported on X/Twitter (@emollick)

Sources cited in this article

Mollick's

Source: gentic.news · Mar 5, 2026 · author=Ala AYADI · citation.json

AI-assisted reporting. Generated by gentic.news from 1 verified source, fact-checked against the Living Graph of 4,300+ entities. Edited by Ala AYADI.

Following this story?

Get a weekly digest with AI predictions, trends, and analysis — free.

AI Analysis

This experiment represents a significant milestone in AI development for several reasons. First, it demonstrates movement beyond simple question-answering toward complex problem-solving that involves planning, evidence evaluation, and structured argumentation. The instruction to create a PowerPoint presentation requires the AI to consider audience, narrative flow, and visual communication—skills that have traditionally been challenging for AI systems. Second, the mention of a specialized "harness" for Gemini DeepThink suggests an emerging trend toward customized tools that enhance specific AI capabilities. This represents a shift from general-purpose models toward specialized systems optimized for particular types of intellectual work. Such developments could lead to more effective human-AI collaboration in research settings. Most importantly, this experiment highlights how AI is developing capabilities in the humanities and historical sciences, not just in STEM fields. The ability to construct arguments about historical questions requires understanding of evidence standards, logical reasoning, and narrative construction—skills that are fundamentally different from pattern recognition in large datasets. This suggests that AI's impact may be broader than previously anticipated, potentially affecting fields that rely on argumentation and interpretation as much as computation and prediction.

#machine learning #historical analysis #research methods #ai development

Compare side-by-side

Claude Opus 4.6 vs GPT-5.2 Pro

→

Mentioned in this article

Claude Opus 4.6 GPT-5.2 Pro Gemini 3 Deep Think Ethan Mollick Gemini

Enjoyed this article?