OpenAI's GPT-5.5 Instant model beat doctors on accuracy, clarity, and completeness in health responses. The company reported a 71% drop in incorrect health statements over two months.
Key facts
- GPT-5.5 Instant beats doctors on accuracy, clarity, completeness.
- Error rate on health statements dropped 71% over two months.
- 260+ doctors from 60 countries reviewed 700K responses.
- 230M+ weekly ChatGPT users ask health-related questions.
- Model matches top Thinking models on HealthBench at lower cost.
OpenAI has upgraded ChatGPT's healthcare capabilities with GPT-5.5 Instant. In the company's own comparative tests, the model now outscores answers written by doctors in accuracy, clarity, and completeness. The error rate for health-related statements has dropped by 71 percent, according to OpenAI. According to The Decoder
How GPT-5.5 Instant compares to prior models
The updated model matches the performance of the most expensive Thinking models on machine-based health tests like HealthBench and HealthBench Professional, but at a fraction of the cost. GPT-5.5 Instant is available to all free ChatGPT users, though with usage limits. This represents a significant cost-performance improvement over the GPT-4o-era health capabilities, which were limited to paid tiers.
The human feedback pipeline
A network of over 260 doctors from 60 countries is behind these improvements. They've reviewed more than 700,000 model responses. According to OpenAI, more than 230 million people use ChatGPT weekly for health-related questions, things like understanding lab results, prepping for doctor's appointments, or sorting out insurance questions. OpenAI also offers specialized tools for healthcare professionals, including ChatGPT for Clinicians and OpenAI for Healthcare.

The scale of doctor-reviewed training data — 700,000 responses — is notable but the claim of beating doctors on written answers comes with a caveat: OpenAI's tests compare against generic doctor-written answers, not specialist consultations or in-person diagnosis. The company did not disclose whether the doctors were aware they were being benchmarked against an AI, nor the specific test methodology beyond the 71% error reduction figure.
What to watch
Watch for third-party validation of the 71% error reduction claim, ideally from a medical journal or independent audit. Also track whether GPT-5.5 Instant's health capabilities narrow the market share gap with Google's Med-PaLM 2, especially as ChatGPT's overall share dipped below 50% in June 2026.
Source: the-decoder.com







