Study Reveals Critical Flaws in AI Medical Triage: ChatGPT Misses Over Half of Emergencies
A groundbreaking study from the Icahn School of Medicine at Mount Sinai has revealed alarming deficiencies in artificial intelligence's ability to handle medical emergencies. Researchers found that OpenAI's ChatGPT provided incorrect advice in over 50% of emergency medical scenarios tested, marking the first published evaluation of large language models' performance in this critical domain.
The Mount Sinai Study Methodology
The research team presented ChatGPT with 60 different medical emergency scenarios designed to test the AI's triage capabilities. These scenarios covered a range of urgent medical situations where timely, accurate advice could mean the difference between life and death. The study represents the most comprehensive published assessment to date of how general-purpose AI chatbots perform when faced with genuine medical crises.
According to the findings, ChatGPT consistently "under-triaged" emergency situations—meaning the system recommended lower-acuity care when the correct clinical response would have been immediate emergency assessment. This pattern of underestimating severity poses significant risks to users who might rely on AI for medical guidance.
Why AI Struggles with Medical Emergencies
Clinical evaluations of health-focused AI models have identified several troubling gaps in their ability to recognize when people need urgent care. The systems appear particularly challenged by:
- Atypical presentations of common conditions
- Rapidly evolving medical situations that require dynamic assessment
- Scenarios where users omit critical details in their descriptions
- Conditions with overlapping symptoms that require differential diagnosis
The fundamental issue appears to be that current AI systems, while impressive in their language capabilities, lack the clinical reasoning, contextual understanding, and pattern recognition that experienced medical professionals develop through years of training and practice.
The Real-World Implications
The study's findings carry profound implications for public health and AI safety. When AI systems misclassify severe symptoms as non-urgent, they can inadvertently delay life-saving treatment. This is particularly concerning given the growing trend of people turning to AI chatbots for health information and advice.
"AI triage tools can appear helpful for routine questions," notes the research, "but misclassifying severe symptoms can delay life‑saving treatment." Developers and independent researchers have both flagged the potential for harm if such tools are used as a substitute for professional medical evaluation.
Contrast with Specialized Medical AI
While general-purpose chatbots like ChatGPT struggle with medical emergencies, the AI landscape includes more specialized approaches. On the same day the Mount Sinai study was reported, Microsoft released Phi-4-reasoning-vision-15B, a 15-billion parameter multimodal reasoning model designed specifically for tasks requiring perception and selective reasoning, with particular strengths in scientific and mathematical domains.
This contrast highlights an important distinction in AI development: general-purpose conversational AI versus specialized systems built for specific domains. The Microsoft model represents a more targeted approach to complex reasoning tasks, though it's not designed for medical triage either.
Practical Guidance for the Public
Based on the research findings, medical professionals and AI safety experts offer several clear recommendations:
- Err on the side of caution: Seek emergency care or call emergency services for symptoms like chest pain, sudden severe shortness of breath, severe bleeding, sudden weakness or confusion, and other recognized red flags.
- Use AI tools only as informational adjuncts: They can support basic health education but should never replace clinical judgment.
- Verify advice with professionals: Always follow up with a licensed clinician, especially when symptoms are worsening or unclear.
Regulatory and Development Challenges
The study arrives amid growing scrutiny of AI safety practices across the industry. Recent reports indicate that OpenAI has released frontier AI models without safety evaluations or system cards, breaking from previous transparency commitments. This pattern raises questions about whether adequate safeguards are being implemented for AI systems that people might use in high-stakes situations.
Regulators and health systems are still developing frameworks for evaluating and deploying AI in medical contexts. The Mount Sinai findings will likely accelerate these efforts and prompt more rigorous testing requirements for AI systems that could be used for health advice.
The Path Forward for AI in Medicine
Despite the concerning findings, researchers acknowledge that AI has legitimate applications in healthcare—just not in emergency triage. Potential appropriate uses include:
- Basic health education and information
- Administrative support in clinical settings
- Medical documentation assistance
- Non-urgent symptom checking with clear disclaimers
The key distinction lies in recognizing AI's current limitations and ensuring proper human oversight, particularly in high-stakes medical situations.
Source: Forbes, March 8, 2026, "ChatGPT Provided Wrong Advice In Over 50% Medical Emergencies Tested" by Bruce Lee, with additional context from clinical evaluations of health-focused AI models.





