Mira Murati's Thinking Machines beat the best frontier model by 29.8% fewer errors using Bridgewater's private expert judgments. The system achieved this at 13.8x lower inference cost by training on expert-labeled finance documents.
Key facts
- 29.8% fewer errors vs best frontier model.
- 13.8x lower inference cost.
- Naive prompts yield 46-50% accuracy, expert prompts 74-78%.
- Training used CISPO loss, proposed by MiniMax in 2025.
- Bridgewater provided high-quality expert labels.
Mira Murati's Thinking Machines made Bridgewater's private expert judgment trainable, beating frontier models with 29.8% fewer errors. With naive prompts, all tested models sit around coin-flip accuracy, roughly 46% to 50%. Expert prompts lift them sharply, reaching about 74% to 78% average accuracy. According to @rohanpaul_ai
The workflow was filtering finance articles, reports, central-bank documents, and emails to decide what investors should read. This is a serious signal for enterprise AI, that bringing private judgment in the loop beats general intelligence.
The taste problem
The problem was not reading finance documents, because frontier LLMs can already read them. The harder task was deciding which facts deserve attention inside an investor's workflow. A tariff headline can move markets, while another geopolitical headline may add no signal.
The breakthrough came from replacing written rules with high-quality labels from expert investors. Non-expert labels failed because the task depends on taste, not surface financial language. Bridgewater cleaned those labels by sending model-disputed cases back to experts for review. The model then learned patterns that experts could recognize, but could not fully verbalize.
Training architecture
Training used interleaved batches, CISPO loss, and on-policy distillation from stronger teacher checkpoints. Interleaving helped the model share judgment across tasks without blending them into noise. CISPO controlled policy updates, so learning stayed aggressive without drifting into brittle shortcuts. (CISPO is a new reinforcement-learning loss that caps how strongly each generated token can update the model, improving training stability while keeping useful rare tokens active. It was initially proposed by MiniMax team in 2025.) On-policy distillation penalized moves away from better teachers, then promoted stronger checkpoints.
The result beat the best frontier model, with 29.8% fewer mistakes and 13.8x lower inference cost. The company did not disclose the exact model architecture or parameter count.
What to watch

Watch for Thinking Machines' next benchmark results on enterprise-specific reasoning tasks, and whether other hedge funds adopt similar expert-in-the-loop training for proprietary workflows. A public paper or blog post detailing CISPO integration and model architecture would be the next signal.








