In a concise but pointed social media post, Wharton professor and leading AI adoption researcher Ethan Mollick challenged a persistent narrative in the AI discourse. He expressed skepticism toward the comfort of labeling skills like "problem solving" or "judgement" as domains inherently impossible for AI to master.
"I am not convinced that we should be comfortable calling 'problem solving' or 'judgement' or whatever as skills that are impossible for AI to do well," Mollick wrote. "Like any other skill, there are humans who are really good at it, but that doesn't mean that AIs don't do good judgement, etc."
This statement directly confronts a common rhetorical checkpoint in discussions about AI's limits. As language models have surpassed expectations in coding, writing, and standardized testing, a fallback position has been to reserve faculties like nuanced judgment, strategic problem-solving, and common-sense reasoning as uniquely human bastions. Mollick's argument reframes these not as mystical human exclusives, but as skills on a spectrum—skills where, like in mathematics or language, there exists a distribution of human competency and where AI systems are already demonstrating meaningful, and sometimes superior, performance.
The Argument: From Impossibility to Measurable Competence
Mollick's core argument is one of categorization. By classifying judgment and problem-solving as "skills" akin to drafting an email or writing Python code, he places them within a domain that is inherently learnable and improvable through data, feedback, and architectural innovation. This contrasts with the view that they are emergent properties of human consciousness or biological cognition that cannot be replicated by pattern-matching systems.
This perspective is grounded in observable reality. Recent AI systems are routinely deployed for tasks requiring forms of judgment:
- Strategic Problem-Solving: AlphaFold's solution to protein folding, DeepMind's AlphaCode in competitive programming, and AI agents that can execute multi-step software engineering tasks (e.g., on SWE-Bench) all demonstrate complex problem-solving that extends far beyond retrieval.
- Practical Judgment: Models like Claude 3.5 Sonnet and GPT-4o are used to critique business plans, evaluate legal arguments, and offer strategic advice—applications where users are explicitly seeking a form of reasoned judgment.
- Moral & Ethical Reasoning: While fraught, benchmarks like MMLU's humanities sections and specific ethics datasets show AI systems can apply ethical frameworks and reason about dilemmas at a high level.
The performance is not perfect, but as Mollick implies, neither is human performance. The relevant metric becomes comparative competency, not binary capability.
Why This Reframing Matters
This shift in framing has practical implications for businesses, researchers, and policymakers.
- For Product Development: It moves the conversation from "Can AI do this?" to "How well does AI do this compared to a median human expert, and at what cost?" This is a more productive framework for building applied tools.
- For AI Safety & Alignment: If judgment is a skill, it can be trained, evaluated, and improved with targeted data and reinforcement. This offers a more concrete pathway to aligning AI systems than if judgment is considered an opaque human trait.
- For the Workforce: It undermines the notion that "judgment-based" roles are automatically safe from automation. Instead, it suggests these roles will evolve, with AI acting as a tool or collaborator that augments, and in some cases substitutes for, human judgment on specific sub-tasks.
Mollick's position aligns with a growing body of empirical results. It serves as a rebuttal to moving the goalposts each time an AI system masters a skill previously thought to be secure.
gentic.news Analysis

Mollick's argument is less a report on a new model and more a crucial meta-commentary on the state of AI evaluation in early 2026. It cuts directly against a trend we've chronicled: the sequential retreat of the "AI-proof" domain. First, it was creative writing, then coding, then advanced reasoning. As each fell, the definition of "true" judgment or problem-solving often became nebulous. Mollick calls for this to stop.
This connects directly to our recent coverage of Claude 3.5 Sonnet's performance on agentic coding tasks and DeepSeek-R1's near-human results on SWE-Bench. These aren't just "coding" systems; they are systems that read vague natural language requests, reason about a codebase's structure, plan a sequence of edits, and execute them—a clear manifestation of the problem-solving skill Mollick references. Similarly, our analysis of GPT-4o's multimodal reasoning capabilities shows systems making contextual judgments about images, text, and audio in real time.
The timing is significant. As the industry shifts focus from pure scale to reasoning efficiency, robustness, and specialization, understanding these capabilities as trainable skills is essential. Mollick's view supports the research direction of companies like Anthropic (with its focus on scalable oversight) and OpenAI (with its superalignment efforts), which treat advanced reasoning as an engineering problem. It also provides a intellectual foundation for the burgeoning market in AI agent platforms, which inherently assume AI can exercise judgment to navigate complex workflows.
Ultimately, Mollick is advocating for intellectual honesty. If an AI system can consistently outperform a significant portion of humans on a task requiring judgment—be it medical diagnosis triage, financial document analysis, or tactical game play—then, by any functional definition, it is demonstrating that skill. The challenge then becomes measurement, reliability, and integration, not philosophical exceptionalism.
Frequently Asked Questions
What did Ethan Mollick actually say about AI and judgment?
Ethan Mollick, a professor at the Wharton School, stated on social media that he is not convinced we should label skills like "problem solving" and "judgement" as impossible for AI. He argues these are skills like any other, where some humans excel, but that doesn't mean AI cannot perform them well. He is reframing these capabilities as learnable competencies rather than uniquely human traits.
How is AI currently demonstrating "judgment" or "problem-solving"?
AI demonstrates these skills in measurable ways. Examples include AI coding agents that solve complex software engineering issues by planning and executing multi-step edits (e.g., on SWE-Bench), systems like AlphaFold solving the protein folding problem, and large language models providing nuanced analysis of business strategies, legal documents, or ethical dilemmas. Performance is benchmarked and compared to human baselines, treating judgment as a quantifiable skill.
Why does it matter if we call AI's abilities "judgment"?
The terminology shapes perception, research, and business strategy. Calling it "judgment" moves the discussion from a binary "can/cannot" to a spectrum of competency. This makes it a target for improvement through training and engineering, influences how businesses integrate AI into decision-making roles, and challenges the assumption that certain knowledge-worker jobs are immune to automation because they require human judgment.
Who is Ethan Mollick and why is his view significant?
Ethan Mollick is a professor at the Wharton School of the University of Pennsylvania who specializes in innovation, entrepreneurship, and the impact of AI on work and education. He is a leading voice on practical AI adoption. His significance lies in applying a grounded, empirical lens to AI's capabilities, cutting through both hype and fear. His argument carries weight because it is based on observing real-world AI performance, not abstract theory.









