🔥 Qwen3.6-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled-GGUF

GGUF quantizations of hesamation/Qwen3.6-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled, a reasoning SFT fine-tune of Qwen/Qwen3.6-35B-A3B on Claude Opus 4.6-style chain-of-thought distillation data.

The source fine-tune is text-only. The Qwen3.6 base architecture includes a vision encoder, but this fine-tuning run did not train on image or video examples. Treat these GGUF files as text-generation/runtime quantizations of the merged fine-tuned checkpoint.

This fine-tuning run is inspired by Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled, including the notebook/training workflow style and Claude Opus reasoning-distillation direction.

Follow on X Discord

Available GGUF Quantizations

This repo is intended to host the following GGUF variants. Files are uploaded as each quantization finishes.

Quant Typical use
Q4_K_M Smallest practical general-purpose quant for local inference
Q5_K_M Better quality/size balance than Q4
Q6_K Higher-quality quant when VRAM/RAM budget allows
Q8_0 Largest quant here; closest to source quality among these options

Benchmark Results

The benchmark below was run on the merged source model, not separately on each GGUF quant. Quantization can change scores, especially at lower bitrates, so treat this as source-checkpoint context.

The MMLU-Pro pass used 70 total questions per model: --limit 5 across 14 MMLU-Pro subjects. Treat this as a smoke/comparative check, not a release-quality full benchmark.

Benchmark Harness Samples per model Setting Metric Base model Source merged model Delta
MMLU-Pro overall lm-evaluation-harness 70 --limit 5 across 14 subjects exact_match, custom-extract 42.86% 75.71% +32.85 pp

Base model: Qwen/Qwen3.6-35B-A3B. Source merged model: hesamation/Qwen3.6-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled.

Community benchmarks welcome

To better understand this fine-tuned model and its GGUF quantizations, I welcome independent benchmark results. If you run evaluations, please include the benchmark name, harness/script, sample count, decoding settings, quant file, and raw logs or result files when possible.

Share results by opening a PR/discussion or DMing @hesamation on X.

Training Summary

Qwen/Qwen3.6-35B-A3B
  -> supervised fine-tuning with LoRA
  -> merged full model
  -> GGUF quantization with llama.cpp
Setting Value
Fine-tuning method Supervised fine-tuning with LoRA
LoRA target Attention-only modules
LoRA rank / alpha 32 / 32
Micro-batch size 1
Gradient accumulation 32
Epochs 2
Completed steps 762 / 762
Final reported training loss 0.3362497625740494
Dataset max tokens 8192
Max sequence length 32768

Training Data

The source model samples and normalizes reasoning conversations from three datasets, then renders them with the qwen3-thinking chat template and response-only SFT masking.

Dataset Requested sample count Role
nohurry/Opus-4.6-Reasoning-3000x-filtered 3,900 Claude Opus reasoning trajectories
Jackrong/Qwen3.5-reasoning-700x 700 Curated Qwen reasoning samples
Roman1111111/claude-opus-4.6-10000x 9,633 Additional Claude Opus reasoning examples

Intended Use

These GGUF files are intended for local or server-side text inference through runtimes that support GGUF and the Qwen3.6 architecture, such as recent llama.cpp builds. Choose the quantization based on your memory budget and quality target.

Because the fine-tune is text-only, image/video behavior should be treated as inherited from the base model rather than improved by this training run.

Acknowledgements

Thanks to the Qwen team for the base model, Unsloth for the training stack, llama.cpp for GGUF tooling, and Jackrong for the public reasoning-distillation workflow that inspired this fine-tune.

Downloads last month
867
GGUF
Model size
35B params
Architecture
qwen35moe
Hardware compatibility
Log In to add your hardware

4-bit

5-bit

6-bit

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for hesamation/Qwen3.6-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled-GGUF

Datasets used to train hesamation/Qwen3.6-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled-GGUF

Collection including hesamation/Qwen3.6-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled-GGUF

Evaluation results

  • exact_match, custom-extract, limited sample on source merged model on MMLU-Pro
    test set self-reported
    75.710