Update README.md
Browse files
README.md
CHANGED
@@ -4,9 +4,13 @@ base_model:
|
|
4 |
- Qwen/Qwen3-32B
|
5 |
---
|
6 |
|
7 |
-
The missing "base model" of Qwen3-32B. This model
|
8 |
|
9 |
This model is the result of continued pre-training on Qwen3-32B, using a multilingual dataset of mixed code and text.
|
10 |
|
11 |
-
The purpose of training this model is to provide a model that is close to a "pre-trained" state, reducing the influence of the original Qwen3's linguistic style on subsequent fine-tuning efforts.
|
|
|
|
|
|
|
|
|
12 |
|
|
|
4 |
- Qwen/Qwen3-32B
|
5 |
---
|
6 |
|
7 |
+
The missing "base model" of Qwen3-32B. This model serves as the foundation for our R1-0528 distillation work.
|
8 |
|
9 |
This model is the result of continued pre-training on Qwen3-32B, using a multilingual dataset of mixed code and text.
|
10 |
|
11 |
+
The purpose of training this model is to provide a model that is close to a "pre-trained" state, reducing the influence of the original Qwen3's linguistic style on subsequent fine-tuning efforts.
|
12 |
+
|
13 |
+
We are providing this model to the community to serve as a base model for further SFT, this model is not intended for direct inference.
|
14 |
+
|
15 |
+
|
16 |
|