Trust Your Critic: Robust Reward Modeling and Reinforcement Learning for Faithful Image Editing and Generation Paper • 2603.12247 • Published Mar 12 • 24
RLPR Collection Extrapolating RLVR to General Domains without Verifiers • 6 items • Updated 2 days ago • 6
RLPR: Extrapolating RLVR to General Domains without Verifiers Paper • 2506.18254 • Published Jun 23, 2025 • 35