zhanghanxiao commited on
Commit
60f0774
·
verified ·
1 Parent(s): 0a79af0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +21 -0
README.md CHANGED
@@ -22,11 +22,22 @@ This curriculum greatly enhances the model’s efficiency and reasoning depth, a
22
 
23
  ### Flagship-Level Efficient Reasoning
24
 
 
 
 
 
 
 
 
 
25
  We comprehensively evaluated Ling-1T against leading flagship models, including both **open-source giants** (e.g., *DeepSeek-V3.1-Terminus*, *Kimi-K2-Instruct-0905*) and **closed-source APIs** (*GPT-5-main*, *Gemini-2.5-Pro*).
26
  Across code generation, software development, competition-level mathematics, professional math, and logical reasoning, Ling-1T consistently demonstrates **superior complex reasoning ability** and overall advantage.
27
 
28
  In the **AIME 25** benchmark, Ling-1T extends the **Pareto frontier** of reasoning accuracy vs. reasoning length, showcasing its strength in **“efficient thinking and precise reasoning.”**
29
 
 
 
 
30
 
31
  ### Aesthetic Understanding and Front-End Generation
32
 
@@ -66,6 +77,10 @@ FP8 mixed-precision training yields **15 %+ end-to-end speedup**, improved memor
66
  A fine-grained, **heterogeneous 1F1B interleaved pipeline** further boosts utilization by 40 %+.
67
  System-level optimizations—fused kernels, communication scheduling, recomputation, checkpointing, simulation, and telemetry—ensure stable trillion-scale training.
68
 
 
 
 
 
69
  Pre-training used over **20 T high-quality tokens**, with **> 40 % reasoning-dense data** in later stages.
70
  Mid-training introduced **curated chain-of-thought corpora** for “**reasoning pre-activation**”, improving downstream reasoning stability.
71
  A custom **WSM (Warmup–Stable–Merge)** LR scheduler with mid-train checkpoint merging simulates LR decay and boosts generalization.
@@ -80,6 +95,12 @@ For reinforcement learning, we introduce **LPO (Linguistics-Unit Policy Optimiza
80
  Unlike GRPO (token-level) or GSPO (sequence-level) algorithms, LPO treats *sentences* as the natural semantic action units, enabling precise alignment between rewards and reasoning behavior.
81
  Empirically, LPO offers superior **training stability** and **generalization** across reasoning tasks.
82
 
 
 
 
 
 
 
83
 
84
  ## Evaluation
85
 
 
22
 
23
  ### Flagship-Level Efficient Reasoning
24
 
25
+ <p align="center">
26
+ <img src="https://mdn.alipayobjects.com/huamei_bcz3yt/afts/img/X7mZSJQX_fsAAAAAT_AAAAgADkV7AQFr/original"/>
27
+ <p>
28
+
29
+ <p align="center">
30
+ <img src="https://mdn.alipayobjects.com/huamei_bcz3yt/afts/img/DZ1kSKT57J0AAAAAUOAAAAgADkV7AQFr/original"/>
31
+ <p>
32
+
33
  We comprehensively evaluated Ling-1T against leading flagship models, including both **open-source giants** (e.g., *DeepSeek-V3.1-Terminus*, *Kimi-K2-Instruct-0905*) and **closed-source APIs** (*GPT-5-main*, *Gemini-2.5-Pro*).
34
  Across code generation, software development, competition-level mathematics, professional math, and logical reasoning, Ling-1T consistently demonstrates **superior complex reasoning ability** and overall advantage.
35
 
36
  In the **AIME 25** benchmark, Ling-1T extends the **Pareto frontier** of reasoning accuracy vs. reasoning length, showcasing its strength in **“efficient thinking and precise reasoning.”**
37
 
38
+ <p align="center">
39
+ <img src="https://mdn.alipayobjects.com/huamei_bcz3yt/afts/img/CNhVT4sGM0kAAAAAciAAAAgADkV7AQFr/original"/>
40
+ <p>
41
 
42
  ### Aesthetic Understanding and Front-End Generation
43
 
 
77
  A fine-grained, **heterogeneous 1F1B interleaved pipeline** further boosts utilization by 40 %+.
78
  System-level optimizations—fused kernels, communication scheduling, recomputation, checkpointing, simulation, and telemetry—ensure stable trillion-scale training.
79
 
80
+ <p align="center">
81
+ <img src="https://mdn.alipayobjects.com/huamei_bcz3yt/afts/img/StIxTrsy-_MAAAAAVTAAAAgADkV7AQFr/original"/>
82
+ <p>
83
+
84
  Pre-training used over **20 T high-quality tokens**, with **> 40 % reasoning-dense data** in later stages.
85
  Mid-training introduced **curated chain-of-thought corpora** for “**reasoning pre-activation**”, improving downstream reasoning stability.
86
  A custom **WSM (Warmup–Stable–Merge)** LR scheduler with mid-train checkpoint merging simulates LR decay and boosts generalization.
 
95
  Unlike GRPO (token-level) or GSPO (sequence-level) algorithms, LPO treats *sentences* as the natural semantic action units, enabling precise alignment between rewards and reasoning behavior.
96
  Empirically, LPO offers superior **training stability** and **generalization** across reasoning tasks.
97
 
98
+ <p align="center">
99
+ <img src="https://mdn.alipayobjects.com/huamei_bcz3yt/afts/img/o10CRK8P8hwAAAAAWwAAAAgADkV7AQFr/original"/>
100
+ <p>
101
+ <p align="center">
102
+ <img src="https://mdn.alipayobjects.com/huamei_bcz3yt/afts/img/J7I6QZqI-6AAAAAAZHAAAAgADkV7AQFr/original"/>
103
+ <p>
104
 
105
  ## Evaluation
106