YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
Model Card: a2ui_qwen3_235b_grpo_0415_v1
1. Overview
This model version is a GRPO-trained A2UI model initialized from a 235B SFT checkpoint. It uses the reward-v2 pipeline and is trained on a filtered subset of the 0409 GRPO data with a larger response budget.
2. Model Metadata
| Field | Value |
|---|---|
| model_version_id | a2ui_qwen3_235b_grpo_0415_v1 |
| algorithm | grpo |
| base_model | Qwen/Qwen3-235B-A22B-Instruct-2507 |
| precision | bf16 |
3. Lineage
| Field | Value |
|---|---|
| parent_model_version_id | a2ui_qwen3_235b_sft_0414_v1 |
4. Training Data
| Field | Value |
|---|---|
| dataset_version_id | a2ui_data_grpo_0409_v2 |
5. Training Configuration
| Field | Value |
|---|---|
| epochs_or_steps | steps=65 |
| batch_size | 32.0 |
| group_size | 8.0 |
| lr | 3e-5 |
| max_ctx | 8192.0 |
| max_resp | 16384 |
| lora_rank | 16.0 |
6. Change Summary
Warm-start from 235B SFT checkpoint, switch to reward-v2 pipeline, train on filtered 0409 subset with larger response budget.
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support