Add model card and metadata

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +47 -0
README.md ADDED
@@ -0,0 +1,47 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ pipeline_tag: text-generation
4
+ base_model: openPangu/openPangu-Embedded-7B
5
+ tags:
6
+ - diffusion
7
+ - parallel-generation
8
+ ---
9
+
10
+ # FLUID-7B
11
+
12
+ FLUID (Flexible Unidirectional Inference Diffusion) is a framework designed to efficiently adapt pre-trained Autoregressive (AR) backbones into parallel diffusion models. By enforcing **Strictly Causal Alignment** and introducing **Elastic Horizons**, FLUID achieves state-of-the-art performance with significantly less training data compared to standard diffusion models.
13
+
14
+ - **Paper:** [From AR to Diffusion: Efficiently Adapting Large Language Models with Strictly Causal and Elastic Horizons](https://huggingface.co/papers/2605.27387)
15
+ - **GitHub Repository:** [Oli-lab-nun/FLUID](https://github.com/Oli-lab-nun/FLUID)
16
+
17
+ ## Key Features
18
+
19
+ * **Strictly Causal Alignment**: Unlike bidirectional diffusion, FLUID uses a lower-triangular attention mask to maintain the inductive biases of AR priors. This enables seamless initialization from GPT-style checkpoints like openPangu-Embedded-7B.
20
+ * **Elastic Horizon Modeling**: An entropy-driven mechanism that dynamically modulates denoising strides based on local information density. It "sprints" through predictable text and "downshifts" for complex reasoning.
21
+ * **Training Efficiency**: Achieves superior results on reasoning benchmarks using only 2.7B tokens of adaptation data, outperforming models trained on trillions of tokens.
22
+
23
+ ## Performance
24
+
25
+ FLUID-7B matches or exceeds top-tier AR and Diffusion baselines across standard benchmarks:
26
+
27
+ | Model | Type | Tokens | MMLU | GSM8K | MATH500 | HumanEval |
28
+ | :--- | :--- | :--- | :--- | :--- | :--- | :--- |
29
+ | LLaMA-3-8B | AR | 15T | 68.4 | 78.3 | 27.4 | 59.8 |
30
+ | Qwen-2.5-7B | AR | 18T | 76.6 | 91.6 | 84.8 | 79.2 |
31
+ | LLaDA-8B | Diff | 2.0T | 65.5 | 36.2 | 34.2 | 47.6 |
32
+ | **FLUID-7B (Ours)** | **Diff** | **2.7B** | **67.8** | **91.9** | **61.8** | **60.4** |
33
+
34
+ ## Acknowledgements
35
+
36
+ FLUID-7B is adapted from the **openPangu-Embedded-7B** base model. We gratefully acknowledge the developers of openPangu for releasing their model and related resources to the community.
37
+
38
+ ## Citation
39
+
40
+ ```bibtex
41
+ @inproceedings{fluid2026,
42
+ title={From AR to Diffusion: Efficiently Adapting Large Language Models with Strictly Causal and Elastic Horizons},
43
+ author={Anonymous},
44
+ booktitle={Submission to ACL 2026},
45
+ year={2026}
46
+ }
47
+ ```