AI & ML interests
None defined yet.
Recent Activity
View all activity
Papers
RUBRIC-ARROW: Alternating Pointwise Rubric Reward Modeling for LLM Post-training in Non-verifiable Domains
Alternating Reinforcement Learning for Rubric-Based Reward Modeling in Non-Verifiable LLM Post-Training
models 12
OpenRubrics/RubricARROW-8B-Rubric
Text Generation • 308k • Updated • 121
OpenRubrics/RubricARROW-8B-Judge
Text Generation • 308k • Updated • 115
OpenRubrics/RubricRM-4B-Rubric
196k • Updated • 2
OpenRubrics/RubricRM-4B-Judge
196k • Updated • 5
OpenRubrics/RubricRM-4B-Rubric-v2
196k • Updated • 21
OpenRubrics/RubricRM-8B-Judge
308k • Updated • 11
OpenRubrics/RubricRM-8B-Rubric
308k • Updated • 12
OpenRubrics/RubricRM-8B-Judge-v2
308k • Updated • 19
OpenRubrics/RubricRM-8B-Rubric-v2
308k • Updated • 18
OpenRubrics/RubricRM-4B-Judge-v2
196k • Updated • 13 • 1