AIPlans/Qwen3-0.6B-GRPO-CrossCoder-Only
AIPlans/Qwen3-0.6B-ORPO-CrossCoder-Only
AIPlans/Qwen3-0.6B-IPO-CrossCoder-Only
AIPlans/Qwen3-0.6B-KTO-CrossCoder-Only
AIPlans/Qwen3-0.6B-PPO-CrossCoder-Only
Text Generation
• 0.6B • Updated • 204
• 1
Text Generation
• 0.8B • Updated • 4
AIPlans/Qwen3-0.6B-ORPO-Crosscoder-MixedDataset
Updated
AIPlans/Qwen3-0.6B-GRPO-Crosscoder-MixedDataset
Updated
AIPlans/Qwen3-0.6B-KTO-Crosscoder-MixedDataset
Updated
AIPlans/Qwen3-0.6B-IPO-Crosscoder-MixedDataset
Updated
Reinforcement Learning
• 0.6B • Updated • 9
• 2
AIPlans/Qwen3-0.6B-GRPO-RM_NVIDIA
Text Generation
• 0.6B • Updated • 14
AIPlans/Qwen3-0.6B-GRPO_Epoch2
Text Generation
• 0.6B • Updated • 6
AIPlans/Qwen3-0.6B-GRPO_Epoch1
Text Generation
• 0.6B • Updated • 3
Reinforcement Learning
• 0.6B • Updated • 18
• 1
AIPlans/qwen3-0.6b-base-PPO-hs2
Updated
AIPlans/Qwen3-0.6B-DPO_Epoch_1
Text Generation
• 0.6B • Updated • 5
AIPlans/Qwen3-0.6B-SFT-hs2
Text Generation
• 0.6B • Updated • 7
AIPlans/Qwen3-0.6B-RM-hs2
Text Classification
• 0.6B • Updated • 14
• 1
Text Generation
• Updated • 9
AIPlans/Qwen3-0.6B-DPO_NOTLORA
Text Generation
• 0.6B • Updated • 3
Text Generation
• Updated • 9
• 1
Text Generation
• Updated • 12
AIPlans/qwen3-0.6b-hh-rlhf-sft
0.6B • Updated • 5
AIPlans/Qwen3-0.6B-KTO_trial
Text Generation
• 0.6B • Updated • 4
• 1
AIPlans/qwen3-0.6b-sft-hh-rlhf-lora
Updated