Update README.md
Browse files
README.md
CHANGED
|
@@ -22,11 +22,22 @@ This curriculum greatly enhances the model’s efficiency and reasoning depth, a
|
|
| 22 |
|
| 23 |
### Flagship-Level Efficient Reasoning
|
| 24 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 25 |
We comprehensively evaluated Ling-1T against leading flagship models, including both **open-source giants** (e.g., *DeepSeek-V3.1-Terminus*, *Kimi-K2-Instruct-0905*) and **closed-source APIs** (*GPT-5-main*, *Gemini-2.5-Pro*).
|
| 26 |
Across code generation, software development, competition-level mathematics, professional math, and logical reasoning, Ling-1T consistently demonstrates **superior complex reasoning ability** and overall advantage.
|
| 27 |
|
| 28 |
In the **AIME 25** benchmark, Ling-1T extends the **Pareto frontier** of reasoning accuracy vs. reasoning length, showcasing its strength in **“efficient thinking and precise reasoning.”**
|
| 29 |
|
|
|
|
|
|
|
|
|
|
| 30 |
|
| 31 |
### Aesthetic Understanding and Front-End Generation
|
| 32 |
|
|
@@ -66,6 +77,10 @@ FP8 mixed-precision training yields **15 %+ end-to-end speedup**, improved memor
|
|
| 66 |
A fine-grained, **heterogeneous 1F1B interleaved pipeline** further boosts utilization by 40 %+.
|
| 67 |
System-level optimizations—fused kernels, communication scheduling, recomputation, checkpointing, simulation, and telemetry—ensure stable trillion-scale training.
|
| 68 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 69 |
Pre-training used over **20 T high-quality tokens**, with **> 40 % reasoning-dense data** in later stages.
|
| 70 |
Mid-training introduced **curated chain-of-thought corpora** for “**reasoning pre-activation**”, improving downstream reasoning stability.
|
| 71 |
A custom **WSM (Warmup–Stable–Merge)** LR scheduler with mid-train checkpoint merging simulates LR decay and boosts generalization.
|
|
@@ -80,6 +95,12 @@ For reinforcement learning, we introduce **LPO (Linguistics-Unit Policy Optimiza
|
|
| 80 |
Unlike GRPO (token-level) or GSPO (sequence-level) algorithms, LPO treats *sentences* as the natural semantic action units, enabling precise alignment between rewards and reasoning behavior.
|
| 81 |
Empirically, LPO offers superior **training stability** and **generalization** across reasoning tasks.
|
| 82 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 83 |
|
| 84 |
## Evaluation
|
| 85 |
|
|
|
|
| 22 |
|
| 23 |
### Flagship-Level Efficient Reasoning
|
| 24 |
|
| 25 |
+
<p align="center">
|
| 26 |
+
<img src="https://mdn.alipayobjects.com/huamei_bcz3yt/afts/img/X7mZSJQX_fsAAAAAT_AAAAgADkV7AQFr/original"/>
|
| 27 |
+
<p>
|
| 28 |
+
|
| 29 |
+
<p align="center">
|
| 30 |
+
<img src="https://mdn.alipayobjects.com/huamei_bcz3yt/afts/img/DZ1kSKT57J0AAAAAUOAAAAgADkV7AQFr/original"/>
|
| 31 |
+
<p>
|
| 32 |
+
|
| 33 |
We comprehensively evaluated Ling-1T against leading flagship models, including both **open-source giants** (e.g., *DeepSeek-V3.1-Terminus*, *Kimi-K2-Instruct-0905*) and **closed-source APIs** (*GPT-5-main*, *Gemini-2.5-Pro*).
|
| 34 |
Across code generation, software development, competition-level mathematics, professional math, and logical reasoning, Ling-1T consistently demonstrates **superior complex reasoning ability** and overall advantage.
|
| 35 |
|
| 36 |
In the **AIME 25** benchmark, Ling-1T extends the **Pareto frontier** of reasoning accuracy vs. reasoning length, showcasing its strength in **“efficient thinking and precise reasoning.”**
|
| 37 |
|
| 38 |
+
<p align="center">
|
| 39 |
+
<img src="https://mdn.alipayobjects.com/huamei_bcz3yt/afts/img/CNhVT4sGM0kAAAAAciAAAAgADkV7AQFr/original"/>
|
| 40 |
+
<p>
|
| 41 |
|
| 42 |
### Aesthetic Understanding and Front-End Generation
|
| 43 |
|
|
|
|
| 77 |
A fine-grained, **heterogeneous 1F1B interleaved pipeline** further boosts utilization by 40 %+.
|
| 78 |
System-level optimizations—fused kernels, communication scheduling, recomputation, checkpointing, simulation, and telemetry—ensure stable trillion-scale training.
|
| 79 |
|
| 80 |
+
<p align="center">
|
| 81 |
+
<img src="https://mdn.alipayobjects.com/huamei_bcz3yt/afts/img/StIxTrsy-_MAAAAAVTAAAAgADkV7AQFr/original"/>
|
| 82 |
+
<p>
|
| 83 |
+
|
| 84 |
Pre-training used over **20 T high-quality tokens**, with **> 40 % reasoning-dense data** in later stages.
|
| 85 |
Mid-training introduced **curated chain-of-thought corpora** for “**reasoning pre-activation**”, improving downstream reasoning stability.
|
| 86 |
A custom **WSM (Warmup–Stable–Merge)** LR scheduler with mid-train checkpoint merging simulates LR decay and boosts generalization.
|
|
|
|
| 95 |
Unlike GRPO (token-level) or GSPO (sequence-level) algorithms, LPO treats *sentences* as the natural semantic action units, enabling precise alignment between rewards and reasoning behavior.
|
| 96 |
Empirically, LPO offers superior **training stability** and **generalization** across reasoning tasks.
|
| 97 |
|
| 98 |
+
<p align="center">
|
| 99 |
+
<img src="https://mdn.alipayobjects.com/huamei_bcz3yt/afts/img/o10CRK8P8hwAAAAAWwAAAAgADkV7AQFr/original"/>
|
| 100 |
+
<p>
|
| 101 |
+
<p align="center">
|
| 102 |
+
<img src="https://mdn.alipayobjects.com/huamei_bcz3yt/afts/img/J7I6QZqI-6AAAAAAZHAAAAgADkV7AQFr/original"/>
|
| 103 |
+
<p>
|
| 104 |
|
| 105 |
## Evaluation
|
| 106 |
|