Add model card and metadata
#1
by nielsr HF Staff - opened
README.md
ADDED
|
@@ -0,0 +1,47 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
library_name: transformers
|
| 3 |
+
pipeline_tag: text-generation
|
| 4 |
+
base_model: openPangu/openPangu-Embedded-7B
|
| 5 |
+
tags:
|
| 6 |
+
- diffusion
|
| 7 |
+
- parallel-generation
|
| 8 |
+
---
|
| 9 |
+
|
| 10 |
+
# FLUID-7B
|
| 11 |
+
|
| 12 |
+
FLUID (Flexible Unidirectional Inference Diffusion) is a framework designed to efficiently adapt pre-trained Autoregressive (AR) backbones into parallel diffusion models. By enforcing **Strictly Causal Alignment** and introducing **Elastic Horizons**, FLUID achieves state-of-the-art performance with significantly less training data compared to standard diffusion models.
|
| 13 |
+
|
| 14 |
+
- **Paper:** [From AR to Diffusion: Efficiently Adapting Large Language Models with Strictly Causal and Elastic Horizons](https://huggingface.co/papers/2605.27387)
|
| 15 |
+
- **GitHub Repository:** [Oli-lab-nun/FLUID](https://github.com/Oli-lab-nun/FLUID)
|
| 16 |
+
|
| 17 |
+
## Key Features
|
| 18 |
+
|
| 19 |
+
* **Strictly Causal Alignment**: Unlike bidirectional diffusion, FLUID uses a lower-triangular attention mask to maintain the inductive biases of AR priors. This enables seamless initialization from GPT-style checkpoints like openPangu-Embedded-7B.
|
| 20 |
+
* **Elastic Horizon Modeling**: An entropy-driven mechanism that dynamically modulates denoising strides based on local information density. It "sprints" through predictable text and "downshifts" for complex reasoning.
|
| 21 |
+
* **Training Efficiency**: Achieves superior results on reasoning benchmarks using only 2.7B tokens of adaptation data, outperforming models trained on trillions of tokens.
|
| 22 |
+
|
| 23 |
+
## Performance
|
| 24 |
+
|
| 25 |
+
FLUID-7B matches or exceeds top-tier AR and Diffusion baselines across standard benchmarks:
|
| 26 |
+
|
| 27 |
+
| Model | Type | Tokens | MMLU | GSM8K | MATH500 | HumanEval |
|
| 28 |
+
| :--- | :--- | :--- | :--- | :--- | :--- | :--- |
|
| 29 |
+
| LLaMA-3-8B | AR | 15T | 68.4 | 78.3 | 27.4 | 59.8 |
|
| 30 |
+
| Qwen-2.5-7B | AR | 18T | 76.6 | 91.6 | 84.8 | 79.2 |
|
| 31 |
+
| LLaDA-8B | Diff | 2.0T | 65.5 | 36.2 | 34.2 | 47.6 |
|
| 32 |
+
| **FLUID-7B (Ours)** | **Diff** | **2.7B** | **67.8** | **91.9** | **61.8** | **60.4** |
|
| 33 |
+
|
| 34 |
+
## Acknowledgements
|
| 35 |
+
|
| 36 |
+
FLUID-7B is adapted from the **openPangu-Embedded-7B** base model. We gratefully acknowledge the developers of openPangu for releasing their model and related resources to the community.
|
| 37 |
+
|
| 38 |
+
## Citation
|
| 39 |
+
|
| 40 |
+
```bibtex
|
| 41 |
+
@inproceedings{fluid2026,
|
| 42 |
+
title={From AR to Diffusion: Efficiently Adapting Large Language Models with Strictly Causal and Elastic Horizons},
|
| 43 |
+
author={Anonymous},
|
| 44 |
+
booktitle={Submission to ACL 2026},
|
| 45 |
+
year={2026}
|
| 46 |
+
}
|
| 47 |
+
```
|