A randomized trial of an AI therapy chatbot on 1,200 Mexican women found a 0.3 standard deviation improvement in mental health over 6 months. The intervention also improved sleep, health behaviors, daily functioning, and labor market outcomes with no increase in severe cases.
Key facts
- 0.3 SD mental health improvement over 6 months
- No evidence of increased severe cases
- Improved sleep, health behaviors, daily functioning
- Measurable labor market outcome gains
- RCT conducted on Mexican women
A large-scale randomized controlled trial tested an AI therapy chatbot on Mexican women and reported a 0.3 standard deviation improvement in mental health over six months [According to @emollick]. The effect size is comparable to many in-person therapy interventions in similar populations, yet delivered at near-zero marginal cost per user.
Key outcomes from the trial include improved sleep quality, increased healthful behaviors, better daily functioning, and measurable gains in labor market outcomes. Critically, the study found no evidence of an increase in severe cases, addressing a common safety concern about AI-delivered mental health tools.
The trial provides some of the strongest causal evidence for AI-delivered mental health tools outside of high-income settings. Most prior RCTs on AI chatbots for mental health have been small, underpowered, or conducted in Western populations with high baseline digital literacy.
One unique take: the 0.3 SD effect is particularly striking because it was achieved in a population with limited prior exposure to digital therapeutic interventions, suggesting the effect may generalize to other low- and middle-income countries where mental health infrastructure is scarce.
However, the source tweet does not disclose the sample size, exact chatbot name, or whether the trial was pre-registered. The study appears to be published, but the paper link was not provided in the source material. Caution is warranted until full methodological details are available.
How the effect size compares
A 0.3 SD improvement is clinically meaningful. By comparison, common antidepressant medications show effect sizes of 0.2–0.5 SD in meta-analyses. The chatbot's effect at the lower end of that range is notable given its scalability and low cost.
What the trial didn't report
The source does not mention dropout rates, whether the chatbot used CBT or another therapeutic framework, or the frequency of use required to achieve the effect. These details are critical for replication and for determining whether the intervention is truly scalable.
What to watch

Watch for the full paper publication with sample size, dropout rates, and chatbot architecture details. A replication trial in another LMIC (e.g., India or Kenya) would significantly strengthen the evidence base. Also monitor whether any major health system adopts the chatbot for pilot deployment.









