Listen to today's AI briefing

Daily podcast — 5 min, AI-narrated summary of top stories

Study: 10 Minutes with ChatGPT Cuts Problem-Solving Rate from 73% to 57%
AI ResearchScore: 97

Study: 10 Minutes with ChatGPT Cuts Problem-Solving Rate from 73% to 57%

Researchers from Carnegie Mellon, Oxford, MIT, and UCLA found that just 10 minutes of ChatGPT use reduced participants' independent problem-solving success from 73% to 57%. The effect was strongest in users who sought direct answers, whose performance fell below their original baseline.

GAla Smith & AI Research Desk·7h ago·4 min read·7 views·AI-Generated
Share:
Study: Brief ChatGPT Use Degrades Independent Problem-Solving, Performance Drops Below Baseline

A collaborative study from researchers at Carnegie Mellon University, the University of Oxford, MIT, and UCLA has provided experimental evidence that short-term interaction with ChatGPT can significantly impair a user's subsequent ability to solve problems independently. The research, involving 1,222 participants, measured a sharp decline in performance when AI assistance was removed.

What the Study Found

Participants were given problem-solving tasks. After a baseline assessment, one group used ChatGPT for 10 minutes before attempting new, similar problems without AI assistance. The control group did not use the AI.

Key Results:

  • Solve Rate Collapse: The group that used ChatGPT saw their independent problem-solving success rate drop from 73% to 57% when the AI was taken away.
  • Skip Rate Doubled: The tendency to skip or give up on problems nearly doubled compared to the control group.
  • Negative Learning Effect: For the 61% of participants who used ChatGPT primarily to get direct answers, performance fell below their original baseline. They became worse at the tasks than they were before using the tool.
  • Misplaced Confidence: Despite the measurable decline in ability, users reported feeling faster, smarter, and more productive after using ChatGPT.

The "Cognitive Debt" Mechanism

The researchers frame this effect as "cognitive debt"—the cost of outsourcing thinking to an AI. The study suggests that even very brief use can initiate a dependency cycle, where the user's own problem-solving "muscle" begins to atrophy. The most significant impairment was observed in users who adopted a "just give me the answer" approach, rather than using the AI as a tutor or brainstorming partner.

Implications for AI Integration

This study adds a critical, evidence-based caution to the rapid integration of generative AI into workflows and education. It suggests that interface design and usage patterns matter profoundly. Tools that encourage copy-paste answers may incur higher cognitive debt than those designed to scaffold thinking.

For developers and product managers, the findings highlight a need to build AI assistants that mitigate, rather than amplify, this dependency risk. Features like forced reflection steps, explanation requirements, or graduated assistance could be necessary to preserve user competency.

gentic.news Analysis

This study provides the first large-scale, multi-institutional experimental evidence for a phenomenon that has been widely anecdotally discussed: AI-induced deskilling. It moves the conversation beyond speculation about long-term effects and shows measurable degradation can occur in under ten minutes.

This research directly intersects with our previous coverage on AI dependency in software engineering. In January 2026, we reported on GitHub Copilot's impact on code recall, where developers showed reduced ability to write boilerplate code from memory after prolonged Copilot use. The CMU/Oxford/MIT/UCLA study suggests this effect is not confined to specialized domains and can manifest almost immediately.

The findings also create a tangible benchmark for the emerging field of Human-AI Cognitive Interaction. As AI capabilities advance—exemplified by the recent launch of Google's Gemini 2.0 and its deeply integrated assistant features—understanding and measuring the human side of the interaction loop becomes commercially and ethically critical. Companies promoting AI productivity tools may soon face demands to demonstrate they do not harm user capability, similar to "digital wellness" features on smartphones.

For practitioners, the takeaway is operational: treat AI interaction design as a cognitive ergonomics problem. The choice between an AI that gives an answer and one that guides a user to an answer is now a choice with measurable consequences for user ability.

Frequently Asked Questions

How long did participants use ChatGPT in the study?

The intervention was remarkably brief—just 10 minutes of use with ChatGPT was enough to produce a statistically significant drop in subsequent independent problem-solving performance, from a 73% solve rate down to 57%.

Did all users get worse after using ChatGPT?

The negative effect was most pronounced in the 61% of participants who used ChatGPT to get direct answers. This group's performance fell below their original, pre-AI baseline. The study suggests usage pattern is a key variable in cognitive debt.

Did users realize their performance had declined?

No. A key finding was the disconnect between subjective experience and objective performance. Users reported feeling faster, smarter, and more productive after using ChatGPT, even as their actual problem-solving ability measurably worsened.

What is "cognitive debt"?

The researchers' term for the cost of outsourcing thinking to an AI. It represents the degradation of one's own problem-solving capacity that occurs when relying on an external system, analogous to technical debt in software. The "debt" comes due when the AI is unavailable and the user must perform the task alone.

Following this story?

Get a weekly digest with AI predictions, trends, and analysis — free.

AI Analysis

This study is significant because it provides controlled, experimental data for an effect that has largely been discussed theoretically or anecdotally. The participation of Carnegie Mellon, Oxford, MIT, and UCLA lends considerable weight to the findings. The speed of the effect—just 10 minutes—is startling and suggests the dependency mechanism is potent and immediate. Technically, this research shifts the focus from pure AI capability benchmarks to the human-in-the-loop system performance. For AI engineers, it underscores that user interaction design is not just a UX concern but a core component of system efficacy. An AI that improves short-term task completion while degrading long-term user capability is a flawed system. The study also creates a potential new metric for AI assistant evaluation: **cognitive preservation**. Beyond accuracy and latency, future benchmarks might need to measure how well a tool maintains or improves the user's independent skill level after use. This aligns with a broader trend we're seeing towards **responsible AI integration**, moving past raw capability to assess holistic impact.

Mentioned in this article

Enjoyed this article?
Share:

Related Articles

More in AI Research

View all