OPID: Agents Learn From Hindsight Without External Memory
SDAR: Self-Distilled RL Stabilizes Multi-Turn LLM Agents, +9.4% on ALFWorld