YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

Model Card: a2ui_qwen3_235b_grpo_0415_v1

1. Overview

This model version is a GRPO-trained A2UI model initialized from a 235B SFT checkpoint. It uses the reward-v2 pipeline and is trained on a filtered subset of the 0409 GRPO data with a larger response budget.

2. Model Metadata

Field Value
model_version_id a2ui_qwen3_235b_grpo_0415_v1
algorithm grpo
base_model Qwen/Qwen3-235B-A22B-Instruct-2507
precision bf16

3. Lineage

Field Value
parent_model_version_id a2ui_qwen3_235b_sft_0414_v1

4. Training Data

Field Value
dataset_version_id a2ui_data_grpo_0409_v2

5. Training Configuration

Field Value
epochs_or_steps steps=65
batch_size 32.0
group_size 8.0
lr 3e-5
max_ctx 8192.0
max_resp 16384
lora_rank 16.0

6. Change Summary

Warm-start from 235B SFT checkpoint, switch to reward-v2 pipeline, train on filtered 0409 subset with larger response budget.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support