In a recent social media post, Wharton professor and AI adoption researcher Ethan Mollick distilled a critical challenge for organizations integrating AI: its "jagged intelligence" is fundamentally harder to manage than the varied skill sets of human employees. Mollick, author of Co-Intelligence, outlined three specific reasons why.
What Mollick Said
Mollick's post lists three core issues:
- Weaknesses are not always intuitive or identifiable in advance. Unlike a human employee whose gaps in knowledge or skill might be predictable (e.g., a junior analyst may struggle with advanced statistics), an AI's failures can be surprising and emerge only in specific, often critical, contexts. It might ace a complex logic puzzle but fail on a simple arithmetic task embedded within it.
- All LLMs have similar weaknesses, so you can't just hire a different one. In a human team, managers can hire complementary specialists. If one person is weak at data visualization, you hire someone strong in it. With AI, switching from GPT-4 to Claude 3 or Gemini often means encountering the same types of failure modes, as they are trained on similar data and architectures. There is no easy "hire" to patch the hole.
- The jagged frontier is moving outward. The landscape of AI capabilities is not static. What a model fails at today, it might succeed at in next month's update. This constant, rapid evolution makes it difficult to establish stable, reliable processes or guardrails, as the system's competency profile is a moving target.
The Core Problem: Unmanageable Uncertainty
Mollick's framing shifts the discussion from raw benchmark performance to the practicalities of management and operational risk. The issue isn't just that AI is uneven in its capabilities—humans are too—but that this unevenness possesses qualities that defy standard managerial responses.
You cannot reliably train for, hire around, or process-map your way out of AI's specific weaknesses in the same way you can with a human team. The weaknesses are opaque, systemic across vendors, and transient. This creates a unique form of operational uncertainty where the failure modes of a core component are both unknown and unstable.
What This Means in Practice
For technical leaders, this analysis validates the necessity of robust, layered oversight for any AI-augmented workflow. It argues against a simple "prompt and pray" integration. Effective use requires:
- Extensive, continuous testing across the actual task distribution, not just standard benchmarks.
- Human-in-the-loop systems designed to catch non-intuitive failures, not just to review outputs.
- Process flexibility to adapt as model capabilities and weaknesses shift with updates.
gentic.news Analysis
Mollick's observation connects directly to a growing body of empirical findings we've covered. Our December 2025 analysis of the VibeCoder study showed AI coding assistants producing subtle security vulnerabilities that human reviewers consistently missed—a perfect example of a non-intuitive weakness. Furthermore, the industry-wide struggle with AI "hallucination," a uniform weakness across all major LLMs, underscores point two. As we noted in our Q4 2025 roundup, despite claims of reduced hallucination rates from OpenAI, Anthropic, and Google, the fundamental tendency to confabulate remains a shared, systemic limitation.
The moving frontier (point three) is evidenced by the relentless release cadence tracked in our model timeline. Since our last major update, we've seen GPT-4o, Claude 3.5 Sonnet, and Gemini 1.5 Pro all push capabilities outward, but often in different, unpredictable directions. This aligns with Mollick's broader research theme, explored in his 2024 book Co-Intelligence, that successful AI integration is less about tool mastery and more about adapting human organizational structures to partner with a fundamentally alien form of intelligence. His post serves as a crucial reminder that as the technology advances, the core management challenge may not simplify, but evolve in complexity.
Frequently Asked Questions
What is "jagged intelligence" in AI?
Jagged intelligence refers to the uneven capability profile of large language models. An AI might excel at creative writing and legal analysis but fail at basic middle-school math or consistent logical deduction. Its performance is highly task-dependent in a way that doesn't always match human expectations of correlated skills.
If all LLMs have similar weaknesses, does choosing a model matter?
Yes, but not as a solution to specific weaknesses. While failure modes are often similar, the probability and severity of failures can differ significantly between models. Choosing a model involves evaluating which one's particular strength profile best aligns with your primary use case and which one's failure tendencies are less damaging in your specific context. You are selecting a different risk profile, not eliminating a risk category.
How can teams mitigate the risks of AI's jagged intelligence?
Mitigation requires a systemic approach: implement rigorous validation checkpoints where outputs are verified against known facts or by domain experts; design processes where AI outputs are treated as drafts or components, not final products; and cultivate a team culture of vigilant skepticism, training staff to recognize common and uncommon failure patterns. Redundancy and human oversight remain essential.
Is AI's jagged frontier likely to stabilize?
In the short to medium term, no. The field is in a phase of rapid, foundational development. As new architectures (like state-space models), training techniques, and multimodal capabilities emerge, the shape of the capability frontier will continue to shift. Long-term stabilization would likely require the field to converge on a dominant paradigm and enter a period of incremental refinement, which is not the current state of AI research.









