What Happened
Wharton professor and AI researcher Ethan Mollick conducted an experiment using OpenAI's GPT-4o Pro to research the historical impact of Roman aqueducts on labor displacement. Mollick prompted the AI to analyze this specific historical case study and produce a "METR-style graph" — referring to the visual style used by the AI research organization METR (formerly ARC Evals) in their AI capability forecasting reports.
The AI generated a graph showing labor displacement over time, with Mollick noting two key findings from the analysis:
- The displacement followed an exponential pattern with a specific "doubling time" (the exact timeframe wasn't specified in the tweet)
- The exponential growth eventually transitioned into an S-curve pattern as displacement saturated
Mollick added two interpretive lessons from this historical case:
- "Displacing terrible work is good" — referring to the grueling manual water-carrying labor that aqueducts eliminated
- "All exponentials become s-curves in the end" — noting that even rapid technological displacement eventually reaches saturation points
Context & Verification
Mollick explicitly noted that he performed "spot checks" on the AI's research and found it "seemed accurate." This verification step is significant — while the AI conducted the initial research and analysis, human expertise was still required to validate the findings.
The experiment builds on Mollick's ongoing work exploring how advanced AI models can augment research capabilities. As a professor at Wharton who frequently writes about AI's impact on work and education, Mollick has been testing frontier models' abilities to assist with complex analytical tasks that traditionally require specialized historical and economic expertise.
The reference to "METR-style" graphs connects this historical analysis to contemporary AI forecasting. METR (formerly ARC Evals) produces influential reports tracking AI capabilities, often using exponential growth curves to model progress. By applying this analytical framework to historical technology adoption, Mollick creates a bridge between past technological transitions and current AI development trajectories.
Technical Note on the Model
Mollick specified using "GPT-5.4 Pro" in his tweet, which appears to be a typographical error for GPT-4o Pro — OpenAI's current flagship multimodal model. The "o" in GPT-4o stands for "omni," referring to its ability to process and generate text, audio, and visual content. The Pro version offers higher rate limits and priority access to new features.
This model choice is significant because GPT-4o represents one of the most capable publicly available AI systems for complex reasoning tasks. Its ability to research historical economic patterns suggests growing competency in synthesizing information across domains — in this case, combining historical data, economic theory, and data visualization.



