Tencent's Training-Free GRPO: A Paradigm Shift in AI Alignment Without Fine-Tuning
Tencent researchers have introduced Training-Free GRPO, a method that achieves reinforcement learning-level alignment results for just $18 instead of $10,000—with zero parameter updates. This breakthrough could fundamentally change how we optimize language models.










