File size: 12,935 Bytes
ece9e06 8cf104c 68bba5f 8cf104c 47952c0 8cf104c fe0e786 aad4a41 ce64bba fe0e786 aad4a41 60f0774 994b7a0 60f0774 994b7a0 60f0774 fe0e786 aad4a41 fe0e786 aad4a41 60f0774 bc6505f 60f0774 aad4a41 fe0e786 aad4a41 ece9e06 aad4a41 ce64bba ece9e06 aad4a41 fe0e786 aad4a41 1b86b66 aad4a41 ce64bba aad4a41 bc6505f 994b7a0 bc6505f fe0e786 ce64bba aad4a41 60f0774 bc6505f 60f0774 ce64bba aad4a41 734bad7 aad4a41 1883aba aad4a41 60f0774 bc6505f 60f0774 bc6505f 60f0774 aad4a41 fe0e786 aad4a41 8cf104c 994b7a0 eae0080 8cf104c fe0e786 8cf104c 68bba5f fe0e786 8cf104c da4debb 8cf104c fe0e786 8cf104c 0a79af0 8cf104c d5eafa6 8cf104c d5eafa6 8cf104c d5eafa6 8cf104c d5eafa6 8cf104c d5eafa6 8cf104c d5eafa6 8cf104c d5eafa6 8cf104c d5eafa6 8cf104c d5eafa6 8cf104c d5eafa6 8cf104c d5eafa6 d049e5f 8cf104c 998a509 8cf104c c14378b ece9e06 c14378b a1d8dd4 c14378b 8cf104c 679e7cb |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 |
---
license: mit
pipeline_tag: text-generation
library_name: transformers
---
<p align="center">
<img src="https://mdn.alipayobjects.com/huamei_qa8qxu/afts/img/A*4QxcQrBlTiAAAAAAQXAAAAgAemJ7AQ/original" width="100"/>
</p>
<p align="center">🤗 <a href="https://huggingface.co/inclusionAI">Hugging Face</a> | 🤖 <a href="https://modelscope.cn/organization/inclusionAI">ModelScope </a> | 🐙 <a href="https://zenmux.ai/inclusionai/ling-1t?utm_source=hf_inclusionAI">Experience Now</a></p>
## Introduction
**Ling-1T** is the first flagship *non-thinking* model in the Ling 2.0 series, featuring **1 trillion total parameters** with **≈ 50 billion active parameters per token**.
Built on the Ling 2.0 architecture, Ling-1T is designed to push the limits of *efficient reasoning* and *scalable cognition*.
Pre-trained on **20 trillion+ high-quality, reasoning-dense tokens**, Ling-1T-base supports up to **128K context length** and adopts an **evolutionary chain-of-thought (Evo-CoT)** process across mid-training and post-training.
This curriculum greatly enhances the model’s efficiency and reasoning depth, allowing Ling-1T to achieve **state-of-the-art performance** on multiple complex reasoning benchmarks—balancing **accuracy** and **efficiency**.
### Flagship-Level Efficient Reasoning
<p align="center">
<img src="https://mdn.alipayobjects.com/huamei_bcz3yt/afts/img/YiXwTb4Q_vsAAAAAT-AAAAgADkV7AQFr/original"/>
<p>
<p align="center">
<img src="https://mdn.alipayobjects.com/huamei_bcz3yt/afts/img/MEh7Q5FtzbAAAAAAUQAAAAgADkV7AQFr/original"/>
<p>
We comprehensively evaluated Ling-1T against leading flagship models, including both **open-source giants** (e.g., *DeepSeek-V3.1-Terminus*, *Kimi-K2-Instruct-0905*) and **closed-source APIs** (*GPT-5-main*, *Gemini-2.5-Pro*).
Across code generation, software development, competition-level mathematics, professional math, and logical reasoning, Ling-1T consistently demonstrates **superior complex reasoning ability** and overall advantage.
In the **AIME 25** benchmark, Ling-1T extends the **Pareto frontier** of reasoning accuracy vs. reasoning length, showcasing its strength in **“efficient thinking and precise reasoning.”**
<p align="center">
<img src="https://mdn.alipayobjects.com/huamei_bcz3yt/afts/img/J8ciS5KbIrwAAAAAceAAAAgADkV7AQFr/original"/>
<p>
### Aesthetic Understanding and Front-End Generation
Ling-1T excels in visual reasoning and front-end code generation tasks, combining deep semantic understanding with precise code synthesis.
We introduce a hybrid *Syntax–Function–Aesthetics* reward mechanism, enabling the model to not only generate correct and functional code but also demonstrate a refined sense of **visual aesthetics**.
On **ArtifactsBench**, [Ling-1T](https://zenmux.ai/inclusionai/ling-1t?utm_source=hf_inclusionAI) ranks **first among open-source models**, and the benchmark visualizations in this card were, in fact, *generated by Ling-1T itself*.
### Emergent Intelligence at Trillion-Scale
Scaling to the trillion-parameter level has revealed strong **emergent reasoning and transfer capabilities**.
For example, in the **BFCL V3** tool-use benchmark, Ling-1T achieves **≈ 70% tool-call accuracy** with only light instruction tuning—despite having seen no large-scale trajectory data during training.
[Ling-1T](https://zenmux.ai/inclusionai/ling-1t?utm_source=hf_inclusionAI) can:
* Interpret complex natural-language instructions
* Transform abstract logic into functional visual components
* Generate cross-platform compatible front-end code
* Create stylistically controlled marketing copy and multi-lingual text
These capabilities form the foundation for **general, collaborative human–AI intelligence**, which we aim to advance together with the open-source community through Ling-1T’s release.
### Pre-Training at Trillion Scale
The Ling 2.0 architecture was designed from the ground up for trillion-scale efficiency, guided by the **Ling Scaling Law** ([arXiv:2507.17702](https://arxiv.org/abs/2507.17702)).
This ensures architectural and hyperparameter scalability even under **1e25–1e26 FLOPs** of compute.
Key architectural innovations include:
* **1T total / 50B active parameters** with a **1/32 MoE activation ratio**
* **MTP layers** for enhanced compositional reasoning
* **Aux-loss-free**, **sigmoid-scoring expert routing** with **zero-mean updates**
* **QK Normalization** for fully stable convergence
<p align="center">
<img src="https://mdn.alipayobjects.com/huamei_bcz3yt/afts/img/naA9TJe7ttIAAAAAVRAAAAgADkV7AQFr/original"/>
<p>
Ling-1T is the **largest FP8-trained foundation model** known to date.
FP8 mixed-precision training yields **15%+ end-to-end speedup**, improved memory efficiency, and maintains **≤ 0.1% loss deviation** from BF16 across **1T tokens**.
A fine-grained, **heterogeneous 1F1B interleaved pipeline** further boosts utilization by 40 %+.
System-level optimizations—fused kernels, communication scheduling, recomputation, checkpointing, simulation, and telemetry—ensure stable trillion-scale training.
<p align="center">
<img src="https://mdn.alipayobjects.com/huamei_bcz3yt/afts/img/y5UVSKACgLEAAAAAVcAAAAgADkV7AQFr/original"/>
<p>
Pre-training used over **20T high-quality tokens**, with **> 40% reasoning-dense data** in later stages.
Mid-training introduced **curated chain-of-thought corpora** for “**reasoning pre-activation**”, improving downstream reasoning stability.
A custom **WSM (Warmup–Stable–Merge)** LR scheduler([arXiv:2507.17634](https://arxiv.org/abs/2507.17634)) with mid-train checkpoint merging simulates LR decay and boosts generalization.
### Post-Training and Evo-CoT Optimization
Built upon mid-training reasoning activation, post-training adopts **Evo-CoT (Evolutionary Chain-of-Thought)** for progressive reasoning enhancement under controllable cost.
This approach continually expands the **Pareto frontier** of reasoning accuracy vs. efficiency—ideal for reflexive non-thinking models.
For reinforcement learning, we introduce **LPO (Linguistics-Unit Policy Optimization)** —a novel sentence-level policy optimization method.
Unlike GRPO (token-level) or GSPO (sequence-level) algorithms, LPO treats *sentences* as the natural semantic action units, enabling precise alignment between rewards and reasoning behavior.
Empirically, LPO offers superior **training stability** and **generalization** across reasoning tasks.
<p align="center">
<img src="https://mdn.alipayobjects.com/huamei_bcz3yt/afts/img/kbEWT4BGEQQAAAAAWwAAAAgADkV7AQFr/original"/>
<p>
<p align="center">
<img src="https://mdn.alipayobjects.com/huamei_bcz3yt/afts/img/aF5LRqK5LMcAAAAAZHAAAAgADkV7AQFr/original"/>
<p>
## Evaluation
Ling-1T has been extensively evaluated across **knowledge**, **code**, **math**, **reasoning**, **agent**, and **alignment** benchmarks.
It currently stands as the **best open-source flagship non-thinking model**, rivaling closed-source APIs in complex reasoning while maintaining exceptional efficiency and interpretability.
<p align="center">
<img src="https://mdn.alipayobjects.com/huamei_bcz3yt/afts/img/KrwiQZEDHV0AAAAAWkAAAAgADkV7AQFr/original"/>
<p>
## Model Downloads
You can download Ling-1T from the following table. If you are located in mainland China, we also provide the model on ModelScope.cn to speed up the download process.
<center>
| **Model** | **Context Length** | **Download** |
| :-------: | :----------------: | :-------------------------------------------------------------------------------------------------------------------------------------------: |
| Ling-1T | 32K -> 128K (YaRN) | [🤗 HuggingFace](https://huggingface.co/inclusionAI/Ling-1T) [🤖 ModelScope](https://www.modelscope.cn/models/inclusionAI/Ling-1T) |
</center>
Note: If you are interested in previous version, please visit the past model collections in [Huggingface](https://huggingface.co/inclusionAI) or [ModelScope](https://modelscope.cn/organization/inclusionAI).
## Quickstart
### 🚀 Try Online
You can experience Ling-1T online at: [ZenMux](https://zenmux.ai/inclusionai/ling-1t?utm_source=hf_inclusionAI)
### 🔌 API Usage
You can also use Ling-1T through API calls:
```python
from openai import OpenAI
# 1. Initialize the OpenAI client
client = OpenAI(
# 2. Point the base URL to the ZenMux endpoint
base_url="https://zenmux.ai/api/v1",
# 3. Replace with the API Key from your ZenMux user console
api_key="<your ZENMUX_API_KEY>",
)
# 4. Make a request
completion = client.chat.completions.create(
# 5. Specify the model to use in the format "provider/model-name"
model="inclusionai/ling-1t",
messages=[
{
"role": "user",
"content": "What is the meaning of life?"
}
]
)
print(completion.choices[0].message.content)
```
## Deployment
### SGLang
#### Environment Preparation
We will later submit our model to the SGLang official release. Now we can prepare the environment by following these steps:
```shell
pip3 install -U sglang sgl-kernel
```
#### Run Inference
Both BF16 and FP8 models are supported by SGLang now. It depends on the dtype of the model in ${MODEL_PATH}.
Here is the example to run Ling-1T with multiple GPU nodes, where the master node IP is ${MASTER_IP} and server port is ${PORT}:
- Start server:
```bash
# Node 0:
python -m sglang.launch_server --model-path $MODEL_PATH --tp-size 8 --pp-size 4 --dp-size 1 --trust-remote-code --dist-init-addr $MASTER_IP:2345 --port $PORT --nnodes 4 --node-rank 0
# Node 1:
python -m sglang.launch_server --model-path $MODEL_PATH --tp-size 8 --pp-size 4 --dp-size 1 --trust-remote-code --dist-init-addr $MASTER_IP:2345 --port $PORT --nnodes 4 --node-rank 1
# Node 2:
python -m sglang.launch_server --model-path $MODEL_PATH --tp-size 8 --pp-size 4 --dp-size 1 --trust-remote-code --dist-init-addr $MASTER_IP:2345 --port $PORT --nnodes 4 --node-rank 2
# Node 3:
python -m sglang.launch_server --model-path $MODEL_PATH --tp-size 8 --pp-size 4 --dp-size 1 --trust-remote-code --dist-init-addr $MASTER_IP:2345 --port $PORT --nnodes 4 --node-rank 3
# This is only an example. Please adjust arguments according to your actual environment.
```
- Client:
```shell
curl -s http://${MASTER_IP}:${PORT}/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model": "auto", "messages": [{"role": "user", "content": "What is the capital of France?"}]}'
```
More usage can be found [here](https://docs.sglang.ai/basic_usage/send_request.html)
### vLLM
#### Environment Preparation
```bash
pip install vllm==0.11.0
```
#### Run Inference:
Here is the example to deploy the model with multiple GPU nodes, where the master node IP is ${MASTER_IP}, server port is ${PORT} and the path of model is ${MODEL_PATH}:
```bash
# step 1. start ray on all nodes
# step 2. start vllm server only on node 0:
vllm serve $MODEL_PATH --port $PORT --served-model-name my_model --trust-remote-code --tensor-parallel-size 32 --gpu-memory-utilization 0.85
# This is only an example, please adjust arguments according to your actual environment.
```
To handle long context in vLLM using YaRN, we need to follow these two steps:
1. Add a `rope_scaling` field to the model's `config.json` file, for example:
```json
{
...,
"rope_scaling": {
"factor": 4.0,
"original_max_position_embeddings": 32768,
"type": "yarn"
}
}
```
2. Use an additional parameter `--max-model-len` to specify the desired maximum context length when starting the vLLM service.
For detailed guidance, please refer to the vLLM [`instructions`](https://docs.vllm.ai/en/latest/).
## Limitations & Future Plans
While **[Ling-1T](https://zenmux.ai/inclusionai/ling-1t?utm_source=hf_inclusionAI)** has made strong progress in efficient reasoning, cross-domain generalization, and training efficiency, several limitations remain:
* **GQA-based attention**: stable for long-context reasoning but relatively costly. Future versions will adopt **hybrid attention** to improve efficiency.
* **Limited agentic ability**: current model has room to grow in multi-turn interaction, long-term memory, and tool use.
* **Instruction and identity issues**: occasional deviations or role confusion may occur; future updates will enhance **alignment and consistency**.
The future versions of Ling-1T will continue to evolve in architecture, reasoning, and alignment, advancing the series toward more general intelligence.
## License
This code repository is licensed under [the MIT License](https://github.com/inclusionAI/Ling-V2/blob/main/LICENSE).
## FAQ
Recommended temperature? **0.7**
Recommended top_p? **0.95** |