lccurious commited on
Commit
b75e044
Β·
verified Β·
1 Parent(s): b359dde

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +147 -142
README.md CHANGED
@@ -1,142 +1,147 @@
1
- ---
2
- license: apache-2.0
3
- ---
4
- DA2.0-flash-preview
5
- **LLaDA2-flash-preview** is a diffusion language model featuring a 100BA6B Mixture-of-Experts (MoE) architecture. As an enhanced, instruction-tuned iteration of the LLaDA series, it is optimized for practical applications.
6
-
7
- <div align="center">
8
- <img src="https://mdn.alipayobjects.com/huamei_qa8qxu/afts/img/A*kLORSaRfSK8AAAAAgIAAAAgAemJ7AQ/original" width="800" />
9
- </div>
10
-
11
- ---
12
-
13
- | Benchmark | Ling-flash-2.0 | LLaDA2.0-mini-preview | LLaDA2.0-flash-preview |
14
- | :------------------------------ | :-------------: | :-------------------------: | :---------------------: |
15
- | **Average** | 79.93 | 66.59 | 77.03 |
16
- | **Knowledge** | | | |
17
- | MMLU | 87.98 | 72.49 | 83.15 |
18
- | MMLU-PRO | 76.84 | 49.22 | 66.16 |
19
- | CMMLU | 86.59 | 67.53 | 79.64 |
20
- | C-EVAL | 88.03 | 66.54 | 79.28 |
21
- | **Reasoning** | | | |
22
- | squad2.0 | 81.32 | 85.61 | 90.61 |
23
- | drop | 88.32 | 79.49 | 88.17 |
24
- | korbench | 68.96 | 37.26 | 53.28 |
25
- | **Coding** | | | |
26
- | CruxEval-O | 82.75 | 61.88 | 74.50 |
27
- | mbpp | 85.01 | 77.75 | 86.65 |
28
- | MultiPL-E | 65.76 | 62.43 | 72.38 |
29
- | humaneval | 85.98 | 80.49 | 88.41 |
30
- | Bigcodebench-Full | 40.70 | 30.44 | 40.44 |
31
- | **Math** | | | |
32
- | GSM8K | 95.45 | 89.01 | 95.75 |
33
- | math | 96.1 | 73.50 | 83.52 |
34
- | **Agent & Alignment** | | | |
35
- | BFCL_Live | 67.57 | 74.11 | 74.86 |
36
- | IFEval-strict -prompt | 81.52 | 62.50 | 75.60 |
37
-
38
-
39
-
40
- ## πŸš€ Performance Highlights
41
- + **Leading MoE Architecture**:
42
- The open-source **Mixture-of-Experts (MoE) diffusion large language model**, pre-trained from scratch on approximately **20 trillion tokens**.
43
- + **Efficient Inference**:
44
- With **100 billion total parameters**, only **6.1 billion** are activated during inference. LLaDA-flash-preview significantly reduces computational costs while outperforming open-source dense models of similar scale.
45
- + **Impressive Performance on Code & Complex Reasoning**:
46
- Excels in tasks such as **code generation** and **advanced mathematical reasoning**, demonstrating strong reasoning capabilities.
47
- + **Tool Use**:
48
- Supports **tool calling** and achieves excellent performance in complex agent-based tasks.
49
- + **Open & Extensible**:
50
- Fully open-source with commitment to transparency. We plan to release a **leading inference framework** in the future and continue investing in cutting-edge areas like **diffusion LLMs (dLLM)** to drive disruptive innovation.
51
-
52
- ## πŸ—ΊοΈ What's Next
53
-
54
- + **Supercharged Reasoning with LLaDA 2.0:** LLaDA 2.0 series will be fine-tuned with **Reinforcement Learning**, unlocking a new level of sophisticated reasoning and problem-solving abilities.
55
- + **Tools for Innovators:** we will release a **detailed tutorial** and our complete **post-training framework**. Whether you want to master the current model or build your own customized versions, you'll have the tools you need. Stay tuned
56
-
57
- ---
58
-
59
- ## πŸ“¦ Model Variants
60
- | Model ID | Description | Hugging Face Link |
61
- | --- | --- | --- |
62
- | `inclusionAI/LLaDA2-mini-preview` | Instruction-tuned model, ready for downstream applications. | [πŸ€— Model Card](https://huggingface.co/inclusionAI/LLaDA2.0-mini-preview) |
63
- | `inclusionAI/LLaDA2-flash-preview` | Instruction-tuned model, ready for downstream applications. | [πŸ€— Model Card](https://huggingface.co/inclusionAI/LLaDA2.0-flash-preview) |
64
-
65
-
66
- ---
67
-
68
- ## πŸ” Model Overview
69
- **LLaDA2.0-flash-preview** has the following specifications:
70
-
71
- + **Type**: Mixture-of-Experts (MoE) Diffusion Language Model
72
- + **Total Parameters (Non-Embedding)**: 100B
73
- + **Number of Layers**: 32
74
- + **Attention Heads**: 32
75
- + **Context Length**: 4,096 tokens
76
- + **Position Embedding**: Rotary (RoPE)
77
- + **Vocabulary Size**: 157,184
78
-
79
- ---
80
-
81
- ### πŸ€— Hugging Face Transformers
82
- Make sure you have `transformers` and its dependencies installed:
83
-
84
- ```python
85
- import torch
86
- import torch.nn.functional as F
87
- from transformers import AutoModelForCausalLM
88
- from transformers import AutoTokenizer
89
-
90
- model_path = "/path/to/LLaDA2-mini-preview"
91
- device = "auto"
92
- model = AutoModelForCausalLM.from_pretrained(
93
- model_path, trust_remote_code=True, device_map=device
94
- )
95
- model = model.to(torch.bfloat16)
96
- model.eval()
97
- tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
98
-
99
- prompt = "Why does Camus think that Sisyphus is happy?"
100
- input_ids = tokenizer.apply_chat_template(
101
- [{"role": "user", "content": prompt}],
102
- add_generation_prompt=True,
103
- tokenize=True,
104
- return_tensors="pt",
105
- )
106
- generated_tokens = model.generate(
107
- inputs=input_ids,
108
- eos_early_stop=True,
109
- gen_length=512,
110
- block_length=32,
111
- steps=32,
112
- temperature=0.0,
113
- )
114
- generated_answer = tokenizer.decode(
115
- generated_tokens[0],
116
- skip_special_tokens=True,
117
- )
118
- print(generated_answer)
119
- ```
120
-
121
- ### Best Practices
122
- To achieve optimal performance, we recommend the following settings:
123
-
124
- 1. **Sampling Parameters**:
125
- We suggest using `Temperature=0.0`, `block_length=32`, and `steps=32`. Using a higher temperature value may occasionally result in language mixing and a slight decrease in model performance.
126
-
127
- 2. **Adequate Output Length**:
128
- We recommend using an output length of 2048 tokens for most queries. For benchmarking on problems require more output length, such as those found in math and programming competitions, we suggest setting the max output length to 4096 tokens.
129
-
130
-
131
- ---
132
-
133
- ## 🌐 License
134
- This project is licensed under the terms of the [Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0).
135
-
136
- ---
137
-
138
- ## 🀝 Contact & Collaboration
139
- For questions, collaborations, or feedback, please reach out via [Hugging Face](https://huggingface.co/inclusionAI/LLaDA2.0-mini-preview) or open an issue in the [repository](https://github.com/inclusionAI).
140
-
141
- πŸ‘‰ Join us in advancing open, efficient, and intelligent language models!
142
-
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ library_name: transformers
4
+ tags:
5
+ - dllm
6
+ - diffusion
7
+ - llm
8
+ - text_generation
9
+ ---
10
+ DA2.0-flash-preview
11
+ **LLaDA2-flash-preview** is a diffusion language model featuring a 100BA6B Mixture-of-Experts (MoE) architecture. As an enhanced, instruction-tuned iteration of the LLaDA series, it is optimized for practical applications.
12
+
13
+ <div align="center">
14
+ <img src="https://mdn.alipayobjects.com/huamei_qa8qxu/afts/img/A*kLORSaRfSK8AAAAAgIAAAAgAemJ7AQ/original" width="800" />
15
+ </div>
16
+
17
+ ---
18
+
19
+ | Benchmark | Ling-flash-2.0 | LLaDA2.0-mini-preview | LLaDA2.0-flash-preview |
20
+ | :------------------------------ | :-------------: | :-------------------------: | :---------------------: |
21
+ | **Average** | 79.93 | 66.59 | 77.03 |
22
+ | **Knowledge** | | | |
23
+ | MMLU | 87.98 | 72.49 | 83.15 |
24
+ | MMLU-PRO | 76.84 | 49.22 | 66.16 |
25
+ | CMMLU | 86.59 | 67.53 | 79.64 |
26
+ | C-EVAL | 88.03 | 66.54 | 79.28 |
27
+ | **Reasoning** | | | |
28
+ | squad2.0 | 81.32 | 85.61 | 90.61 |
29
+ | drop | 88.32 | 79.49 | 88.17 |
30
+ | korbench | 68.96 | 37.26 | 53.28 |
31
+ | **Coding** | | | |
32
+ | CruxEval-O | 82.75 | 61.88 | 74.50 |
33
+ | mbpp | 85.01 | 77.75 | 86.65 |
34
+ | MultiPL-E | 65.76 | 62.43 | 72.38 |
35
+ | humaneval | 85.98 | 80.49 | 88.41 |
36
+ | Bigcodebench-Full | 40.70 | 30.44 | 40.44 |
37
+ | **Math** | | | |
38
+ | GSM8K | 95.45 | 89.01 | 95.75 |
39
+ | math | 96.1 | 73.50 | 83.52 |
40
+ | **Agent & Alignment** | | | |
41
+ | BFCL_Live | 67.57 | 74.11 | 74.86 |
42
+ | IFEval-strict -prompt | 81.52 | 62.50 | 75.60 |
43
+
44
+
45
+
46
+ ## πŸš€ Performance Highlights
47
+ + **Leading MoE Architecture**:
48
+ The open-source **Mixture-of-Experts (MoE) diffusion large language model**, pre-trained from scratch on approximately **20 trillion tokens**.
49
+ + **Efficient Inference**:
50
+ With **100 billion total parameters**, only **6.1 billion** are activated during inference. LLaDA-flash-preview significantly reduces computational costs while outperforming open-source dense models of similar scale.
51
+ + **Impressive Performance on Code & Complex Reasoning**:
52
+ Excels in tasks such as **code generation** and **advanced mathematical reasoning**, demonstrating strong reasoning capabilities.
53
+ + **Tool Use**:
54
+ Supports **tool calling** and achieves excellent performance in complex agent-based tasks.
55
+ + **Open & Extensible**:
56
+ Fully open-source with commitment to transparency. We plan to release a **leading inference framework** in the future and continue investing in cutting-edge areas like **diffusion LLMs (dLLM)** to drive disruptive innovation.
57
+
58
+ ## πŸ—ΊοΈ What's Next
59
+
60
+ + **Supercharged Reasoning with LLaDA 2.0:** LLaDA 2.0 series will be fine-tuned with **Reinforcement Learning**, unlocking a new level of sophisticated reasoning and problem-solving abilities.
61
+ + **Tools for Innovators:** we will release a **detailed tutorial** and our complete **post-training framework**. Whether you want to master the current model or build your own customized versions, you'll have the tools you need. Stay tuned
62
+
63
+ ---
64
+
65
+ ## πŸ“¦ Model Variants
66
+ | Model ID | Description | Hugging Face Link |
67
+ | --- | --- | --- |
68
+ | `inclusionAI/LLaDA2-mini-preview` | Instruction-tuned model, ready for downstream applications. | [πŸ€— Model Card](https://huggingface.co/inclusionAI/LLaDA2.0-mini-preview) |
69
+ | `inclusionAI/LLaDA2-flash-preview` | Instruction-tuned model, ready for downstream applications. | [πŸ€— Model Card](https://huggingface.co/inclusionAI/LLaDA2.0-flash-preview) |
70
+
71
+
72
+ ---
73
+
74
+ ## πŸ” Model Overview
75
+ **LLaDA2.0-flash-preview** has the following specifications:
76
+
77
+ + **Type**: Mixture-of-Experts (MoE) Diffusion Language Model
78
+ + **Total Parameters (Non-Embedding)**: 100B
79
+ + **Number of Layers**: 32
80
+ + **Attention Heads**: 32
81
+ + **Context Length**: 4,096 tokens
82
+ + **Position Embedding**: Rotary (RoPE)
83
+ + **Vocabulary Size**: 157,184
84
+
85
+ ---
86
+
87
+ ### πŸ€— Hugging Face Transformers
88
+ Make sure you have `transformers` and its dependencies installed:
89
+
90
+ ```python
91
+ import torch
92
+ import torch.nn.functional as F
93
+ from transformers import AutoModelForCausalLM
94
+ from transformers import AutoTokenizer
95
+
96
+ model_path = "/path/to/LLaDA2-mini-preview"
97
+ device = "auto"
98
+ model = AutoModelForCausalLM.from_pretrained(
99
+ model_path, trust_remote_code=True, device_map=device
100
+ )
101
+ model = model.to(torch.bfloat16)
102
+ model.eval()
103
+ tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
104
+
105
+ prompt = "Why does Camus think that Sisyphus is happy?"
106
+ input_ids = tokenizer.apply_chat_template(
107
+ [{"role": "user", "content": prompt}],
108
+ add_generation_prompt=True,
109
+ tokenize=True,
110
+ return_tensors="pt",
111
+ )
112
+ generated_tokens = model.generate(
113
+ inputs=input_ids,
114
+ eos_early_stop=True,
115
+ gen_length=512,
116
+ block_length=32,
117
+ steps=32,
118
+ temperature=0.0,
119
+ )
120
+ generated_answer = tokenizer.decode(
121
+ generated_tokens[0],
122
+ skip_special_tokens=True,
123
+ )
124
+ print(generated_answer)
125
+ ```
126
+
127
+ ### Best Practices
128
+ To achieve optimal performance, we recommend the following settings:
129
+
130
+ 1. **Sampling Parameters**:
131
+ We suggest using `Temperature=0.0`, `block_length=32`, and `steps=32`. Using a higher temperature value may occasionally result in language mixing and a slight decrease in model performance.
132
+
133
+ 2. **Adequate Output Length**:
134
+ We recommend using an output length of 2048 tokens for most queries. For benchmarking on problems require more output length, such as those found in math and programming competitions, we suggest setting the max output length to 4096 tokens.
135
+
136
+
137
+ ---
138
+
139
+ ## 🌐 License
140
+ This project is licensed under the terms of the [Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0).
141
+
142
+ ---
143
+
144
+ ## 🀝 Contact & Collaboration
145
+ For questions, collaborations, or feedback, please reach out via [Hugging Face](https://huggingface.co/inclusionAI/LLaDA2.0-mini-preview) or open an issue in the [repository](https://github.com/inclusionAI).
146
+
147
+ πŸ‘‰ Join us in advancing open, efficient, and intelligent language models!