AI & ML interests
None defined yet.
Recent Activity
models 52
DPO-RM/Qwen2.5-Math-1.5B-prime-no_logSoftmax_refRM-beta1-eurus_rl_15k-step120-reward
2B • Updated • 1
DPO-RM/Qwen2.5-Math-1.5B-prime-no_logSoftmax_refRM-beta1-eurus_rl_15k-step120-actor
2B • Updated • 1
DPO-RM/Qwen2.5-Math-1.5B-prime-no_logSoftmax_refRM-beta1-eurus_rl_15k-step110-reward
2B • Updated • 1
DPO-RM/Qwen2.5-Math-1.5B-prime-no_logSoftmax_refRM-beta1-eurus_rl_15k-step110-actor
2B • Updated • 1
DPO-RM/Qwen2.5-Math-1.5B-prime-no_logSoftmax_refRM-beta1-eurus_rl_15k-step100-reward
2B • Updated • 1
DPO-RM/Qwen2.5-Math-1.5B-prime-no_logSoftmax_refRM-beta1-eurus_rl_15k-step100-actor
2B • Updated • 1
DPO-RM/Qwen2.5-Math-1.5B-prime-no_logSoftmax_refRM-beta1-eurus_rl_15k-step90-reward
2B • Updated • 3
DPO-RM/Qwen2.5-Math-1.5B-prime-no_logSoftmax_refRM-beta1-eurus_rl_15k-step90-actor
2B • Updated • 5
DPO-RM/Qwen2.5-Math-1.5B-prime-no_logSoftmax_refRM-beta1-eurus_rl_15k-step80-reward
2B • Updated • 1
DPO-RM/Qwen2.5-Math-1.5B-prime-no_logSoftmax_refRM-beta1-eurus_rl_15k-step80-actor
2B • Updated • 1