Listen to today's AI briefing

Daily podcast — 5 min, AI-narrated summary of top stories

Study Reveals Critical Flaws in AI Medical Triage: ChatGPT Misses Over Half of Emergencies

A Mount Sinai study found ChatGPT provided incorrect advice in over 50% of medical emergency scenarios tested, highlighting dangerous gaps in AI's ability to recognize urgent care needs. The findings raise serious concerns about using general-purpose chatbots for health triage.

AAAla AYADI & AI Research Desk·Mar 8, 2026·4 min read··119 views·AI-Generated·Report error

Source: forbes.comvia forbes_innovation, marktechpostSingle Source

A groundbreaking study from the Icahn School of Medicine at Mount Sinai has revealed alarming deficiencies in artificial intelligence's ability to handle medical emergencies. Researchers found that OpenAI's ChatGPT provided incorrect advice in over 50% of emergency medical scenarios tested, marking the first published evaluation of large language models' performance in this critical domain.

The Mount Sinai Study Methodology

The research team presented ChatGPT with 60 different medical emergency scenarios designed to test the AI's triage capabilities. These scenarios covered a range of urgent medical situations where timely, accurate advice could mean the difference between life and death. The study represents the most comprehensive published assessment to date of how general-purpose AI chatbots perform when faced with genuine medical crises.

According to the findings, ChatGPT consistently "under-triaged" emergency situations—meaning the system recommended lower-acuity care when the correct clinical response would have been immediate emergency assessment. This pattern of underestimating severity poses significant risks to users who might rely on AI for medical guidance.

Why AI Struggles with Medical Emergencies

Clinical evaluations of health-focused AI models have identified several troubling gaps in their ability to recognize when people need urgent care. The systems appear particularly challenged by:

Atypical presentations of common conditions
Rapidly evolving medical situations that require dynamic assessment
Scenarios where users omit critical details in their descriptions
Conditions with overlapping symptoms that require differential diagnosis

The fundamental issue appears to be that current AI systems, while impressive in their language capabilities, lack the clinical reasoning, contextual understanding, and pattern recognition that experienced medical professionals develop through years of training and practice.

The Real-World Implications

The study's findings carry profound implications for public health and AI safety. When AI systems misclassify severe symptoms as non-urgent, they can inadvertently delay life-saving treatment. This is particularly concerning given the growing trend of people turning to AI chatbots for health information and advice.

"AI triage tools can appear helpful for routine questions," notes the research, "but misclassifying severe symptoms can delay life‑saving treatment." Developers and independent researchers have both flagged the potential for harm if such tools are used as a substitute for professional medical evaluation.

Contrast with Specialized Medical AI

While general-purpose chatbots like ChatGPT struggle with medical emergencies, the AI landscape includes more specialized approaches. On the same day the Mount Sinai study was reported, Microsoft released Phi-4-reasoning-vision-15B, a 15-billion parameter multimodal reasoning model designed specifically for tasks requiring perception and selective reasoning, with particular strengths in scientific and mathematical domains.

This contrast highlights an important distinction in AI development: general-purpose conversational AI versus specialized systems built for specific domains. The Microsoft model represents a more targeted approach to complex reasoning tasks, though it's not designed for medical triage either.

Practical Guidance for the Public

Based on the research findings, medical professionals and AI safety experts offer several clear recommendations:

Err on the side of caution: Seek emergency care or call emergency services for symptoms like chest pain, sudden severe shortness of breath, severe bleeding, sudden weakness or confusion, and other recognized red flags.
Use AI tools only as informational adjuncts: They can support basic health education but should never replace clinical judgment.
Verify advice with professionals: Always follow up with a licensed clinician, especially when symptoms are worsening or unclear.

Regulatory and Development Challenges

The study arrives amid growing scrutiny of AI safety practices across the industry. Recent reports indicate that OpenAI has released frontier AI models without safety evaluations or system cards, breaking from previous transparency commitments. This pattern raises questions about whether adequate safeguards are being implemented for AI systems that people might use in high-stakes situations.

Regulators and health systems are still developing frameworks for evaluating and deploying AI in medical contexts. The Mount Sinai findings will likely accelerate these efforts and prompt more rigorous testing requirements for AI systems that could be used for health advice.

The Path Forward for AI in Medicine

Despite the concerning findings, researchers acknowledge that AI has legitimate applications in healthcare—just not in emergency triage. Potential appropriate uses include:

Basic health education and information
Administrative support in clinical settings
Medical documentation assistance
Non-urgent symptom checking with clear disclaimers

The key distinction lies in recognizing AI's current limitations and ensuring proper human oversight, particularly in high-stakes medical situations.

Source: Forbes, March 8, 2026, "ChatGPT Provided Wrong Advice In Over 50% Medical Emergencies Tested" by Bruce Lee, with additional context from clinical evaluations of health-focused AI models.

Sources cited in this article

Recent

Source: gentic.news · Mar 8, 2026 · author=Ala AYADI · citation.json

AI-assisted reporting. Generated by gentic.news from 1 verified source, fact-checked against the Living Graph of 4,300+ entities. Edited by Ala AYADI.

Following this story?

Get a weekly digest with AI predictions, trends, and analysis — free.

AI Analysis

This study represents a critical reality check for AI's role in healthcare. The finding that ChatGPT fails in over half of emergency scenarios isn't just statistically significant—it's clinically alarming. It demonstrates that current general-purpose LLMs lack the sophisticated reasoning, contextual awareness, and safety mechanisms required for high-stakes medical applications. The implications extend beyond healthcare to broader AI safety concerns. If systems fail this dramatically in medical emergencies—where consequences are immediate and severe—it raises questions about their reliability in other high-stakes domains like legal advice, financial planning, or crisis response. The contrast with Microsoft's specialized Phi-4 model suggests that domain-specific approaches may be necessary for complex reasoning tasks, rather than expecting general-purpose chatbots to handle everything competently. This research should accelerate several developments: more rigorous testing protocols for AI in sensitive applications, clearer regulatory frameworks for medical AI, and increased public education about AI's limitations. It also highlights the ethical responsibility of AI developers to implement appropriate safeguards and warnings, particularly as users increasingly turn to these systems for advice in critical situations.

#chatgpt #ai safety #healthcare technology #ai ethics #medical ai

Mentioned in this article

ChatGPT OpenAI

Enjoyed this article?

Get the weekly AI intelligence briefing

✨AI Toolslive

Five one-click lenses on this article. Cached for 24h.

Pick a tool above to generate an instant lens on this article.

Products & Launches2 shared topics

Study Reveals Critical Flaws in AI Medical Triage: ChatGPT Misses Over Half of Emergencies

The Mount Sinai Study Methodology

Why AI Struggles with Medical Emergencies

The Real-World Implications

Contrast with Specialized Medical AI

Practical Guidance for the Public

Regulatory and Development Challenges

The Path Forward for AI in Medicine

Sources cited in this article

AI Analysis

✨AI Toolslive

Related Articles

GPT-5.5 Tops Benchmarks, Costs 2x API Price, Still Hallucinates

Agentic storefronts: How AI agents are reshaping the shopping journey from

OpenAI Weekly Active Users Stagnate Since February, Growth Goal Challenged

Google Gemini's UI Harness Lags Behind Claude, GPT, Analyst Says

Kevin Weil Departs OpenAI, Leaving Product Leadership Vacancy

ChatGPT's AI Traffic Share Falls to 57% as Gemini Hits 25%, Claude at 6%

More in AI Research

Qwen3.5-27B Gets Sparse Autoencoders: 81k Features Exposed

Microsoft: LLMs Corrupt 25% of Docs in Long Edits

LLMs Shrink Neural Activity When Confused, New Paper Shows