TCOD: Exploring Temporal Curriculum in On-Policy Distillation for Multi-turn Autonomous Agents Paper • 2604.24005 • Published 5 days ago • 7
GrandCode: Achieving Grandmaster Level in Competitive Programming via Agentic Reinforcement Learning Paper • 2604.02721 • Published 29 days ago • 375
Lie to Me: How Faithful Is Chain-of-Thought Reasoning in Reasoning Models? Paper • 2603.22582 • Published Mar 23 • 7