File size: 12,935 Bytes

ece9e06
 
 
8cf104c
 
 
 
 
 
 
 
68bba5f
8cf104c
47952c0
8cf104c
 
 
 
fe0e786
 
aad4a41
ce64bba
fe0e786
aad4a41
 
 
 
60f0774
994b7a0
60f0774
 
 
994b7a0
60f0774
 
fe0e786
 
aad4a41
fe0e786
aad4a41
60f0774
bc6505f
60f0774
aad4a41
 
 
fe0e786
aad4a41
ece9e06
aad4a41
 
 
 
 
ce64bba
ece9e06
aad4a41
 
 
 
 
 
fe0e786
aad4a41
 
 
 
 
1b86b66
aad4a41
 
 
ce64bba
aad4a41
 
 
 
bc6505f
994b7a0
bc6505f
 
fe0e786
ce64bba
aad4a41
 
 
60f0774
bc6505f
60f0774
 
ce64bba
aad4a41
734bad7
aad4a41
 
 
 
 
 
 
1883aba
aad4a41
 
 
60f0774
bc6505f
60f0774
 
bc6505f
60f0774
aad4a41
 
 
fe0e786
aad4a41
8cf104c
994b7a0
 
 
eae0080
8cf104c
 
 
fe0e786
8cf104c
 
 
68bba5f
 
fe0e786
8cf104c
 
 
 
 
 
 
 
 
 
da4debb
8cf104c
 
 
fe0e786
8cf104c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0a79af0
8cf104c
 
 
 
 
 
 
 
 
 
 
 
 
d5eafa6
8cf104c
 
 
d5eafa6
 
 
8cf104c
 
d5eafa6
 
 
8cf104c
d5eafa6
8cf104c
d5eafa6
 
 
 
8cf104c
d5eafa6
 
8cf104c
d5eafa6
 
8cf104c
d5eafa6
 
 
 
 
8cf104c
d5eafa6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8cf104c
 
d5eafa6
 
 
8cf104c
 
d5eafa6
 
 
d049e5f
8cf104c
998a509
8cf104c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c14378b
 
ece9e06
c14378b
 
 
 
 
a1d8dd4
c14378b
 
8cf104c
 
679e7cb




---
license: mit
pipeline_tag: text-generation
library_name: transformers
---

<p align="center">
    <img src="https://mdn.alipayobjects.com/huamei_qa8qxu/afts/img/A*4QxcQrBlTiAAAAAAQXAAAAgAemJ7AQ/original" width="100"/>
</p>

<p align="center">🤗 <a href="https://huggingface.co/inclusionAI">Hugging Face</a>&nbsp;&nbsp; | &nbsp;&nbsp;🤖 <a href="https://modelscope.cn/organization/inclusionAI">ModelScope </a>&nbsp;&nbsp; | &nbsp;&nbsp;🐙 <a href="https://zenmux.ai/inclusionai/ling-1t?utm_source=hf_inclusionAI">Experience Now</a></p>


## Introduction

**Ling-1T** is the first flagship *non-thinking* model in the Ling 2.0 series, featuring **1 trillion total parameters** with **≈ 50 billion active parameters per token**.
Built on the Ling 2.0 architecture, Ling-1T is designed to push the limits of *efficient reasoning* and *scalable cognition*.

Pre-trained on **20 trillion+ high-quality, reasoning-dense tokens**, Ling-1T-base supports up to **128K context length** and adopts an **evolutionary chain-of-thought (Evo-CoT)** process across mid-training and post-training.
This curriculum greatly enhances the model’s efficiency and reasoning depth, allowing Ling-1T to achieve **state-of-the-art performance** on multiple complex reasoning benchmarks—balancing **accuracy** and **efficiency**.


### Flagship-Level Efficient Reasoning

<p align="center">
    <img src="https://mdn.alipayobjects.com/huamei_bcz3yt/afts/img/YiXwTb4Q_vsAAAAAT-AAAAgADkV7AQFr/original"/>
<p>

<p align="center">
    <img src="https://mdn.alipayobjects.com/huamei_bcz3yt/afts/img/MEh7Q5FtzbAAAAAAUQAAAAgADkV7AQFr/original"/>
<p>

We comprehensively evaluated Ling-1T against leading flagship models, including both **open-source giants** (e.g., *DeepSeek-V3.1-Terminus*, *Kimi-K2-Instruct-0905*) and **closed-source APIs** (*GPT-5-main*, *Gemini-2.5-Pro*).
Across code generation, software development, competition-level mathematics, professional math, and logical reasoning, Ling-1T consistently demonstrates **superior complex reasoning ability** and overall advantage.

In the **AIME 25** benchmark, Ling-1T extends the **Pareto frontier** of reasoning accuracy vs. reasoning length, showcasing its strength in **“efficient thinking and precise reasoning.”**

<p align="center">
    <img src="https://mdn.alipayobjects.com/huamei_bcz3yt/afts/img/J8ciS5KbIrwAAAAAceAAAAgADkV7AQFr/original"/>
<p>

### Aesthetic Understanding and Front-End Generation

Ling-1T excels in visual reasoning and front-end code generation tasks, combining deep semantic understanding with precise code synthesis.
We introduce a hybrid *Syntax–Function–Aesthetics* reward mechanism, enabling the model to not only generate correct and functional code but also demonstrate a refined sense of **visual aesthetics**.
On **ArtifactsBench**, [Ling-1T](https://zenmux.ai/inclusionai/ling-1t?utm_source=hf_inclusionAI) ranks **first among open-source models**, and the benchmark visualizations in this card were, in fact, *generated by Ling-1T itself*.


### Emergent Intelligence at Trillion-Scale

Scaling to the trillion-parameter level has revealed strong **emergent reasoning and transfer capabilities**.
For example, in the **BFCL V3** tool-use benchmark, Ling-1T achieves **≈ 70% tool-call accuracy** with only light instruction tuning—despite having seen no large-scale trajectory data during training.
[Ling-1T](https://zenmux.ai/inclusionai/ling-1t?utm_source=hf_inclusionAI) can:

* Interpret complex natural-language instructions
* Transform abstract logic into functional visual components
* Generate cross-platform compatible front-end code
* Create stylistically controlled marketing copy and multi-lingual text

These capabilities form the foundation for **general, collaborative human–AI intelligence**, which we aim to advance together with the open-source community through Ling-1T’s release.


### Pre-Training at Trillion Scale

The Ling 2.0 architecture was designed from the ground up for trillion-scale efficiency, guided by the **Ling Scaling Law** ([arXiv:2507.17702](https://arxiv.org/abs/2507.17702)).
This ensures architectural and hyperparameter scalability even under **1e25–1e26 FLOPs** of compute.

Key architectural innovations include:

* **1T total / 50B active parameters** with a **1/32 MoE activation ratio**
* **MTP layers** for enhanced compositional reasoning
* **Aux-loss-free**, **sigmoid-scoring expert routing** with **zero-mean updates**
* **QK Normalization** for fully stable convergence

<p align="center">
    <img src="https://mdn.alipayobjects.com/huamei_bcz3yt/afts/img/naA9TJe7ttIAAAAAVRAAAAgADkV7AQFr/original"/>
<p>

Ling-1T is the **largest FP8-trained foundation model** known to date.
FP8 mixed-precision training yields **15%+ end-to-end speedup**, improved memory efficiency, and maintains **≤ 0.1% loss deviation** from BF16 across **1T tokens**.
A fine-grained, **heterogeneous 1F1B interleaved pipeline** further boosts utilization by 40 %+.
System-level optimizations—fused kernels, communication scheduling, recomputation, checkpointing, simulation, and telemetry—ensure stable trillion-scale training.

<p align="center">
    <img src="https://mdn.alipayobjects.com/huamei_bcz3yt/afts/img/y5UVSKACgLEAAAAAVcAAAAgADkV7AQFr/original"/>
<p>

Pre-training used over **20T high-quality tokens**, with **> 40% reasoning-dense data** in later stages.
Mid-training introduced **curated chain-of-thought corpora** for “**reasoning pre-activation**”, improving downstream reasoning stability.
A custom **WSM (Warmup–Stable–Merge)** LR scheduler（[arXiv:2507.17634](https://arxiv.org/abs/2507.17634)） with mid-train checkpoint merging simulates LR decay and boosts generalization.


### Post-Training and Evo-CoT Optimization

Built upon mid-training reasoning activation, post-training adopts **Evo-CoT (Evolutionary Chain-of-Thought)** for progressive reasoning enhancement under controllable cost.
This approach continually expands the **Pareto frontier** of reasoning accuracy vs. efficiency—ideal for reflexive non-thinking models.

For reinforcement learning, we introduce **LPO (Linguistics-Unit Policy Optimization)** —a novel sentence-level policy optimization method.
Unlike GRPO (token-level) or GSPO (sequence-level) algorithms, LPO treats *sentences* as the natural semantic action units, enabling precise alignment between rewards and reasoning behavior.
Empirically, LPO offers superior **training stability** and **generalization** across reasoning tasks.

<p align="center">
    <img src="https://mdn.alipayobjects.com/huamei_bcz3yt/afts/img/kbEWT4BGEQQAAAAAWwAAAAgADkV7AQFr/original"/>
<p>
<p align="center">
    <img src="https://mdn.alipayobjects.com/huamei_bcz3yt/afts/img/aF5LRqK5LMcAAAAAZHAAAAgADkV7AQFr/original"/>
<p>

## Evaluation

Ling-1T has been extensively evaluated across **knowledge**, **code**, **math**, **reasoning**, **agent**, and **alignment** benchmarks.
It currently stands as the **best open-source flagship non-thinking model**, rivaling closed-source APIs in complex reasoning while maintaining exceptional efficiency and interpretability.

<p align="center">
    <img src="https://mdn.alipayobjects.com/huamei_bcz3yt/afts/img/KrwiQZEDHV0AAAAAWkAAAAgADkV7AQFr/original"/>
<p>


## Model Downloads

You can download Ling-1T from the following table. If you are located in mainland China, we also provide the model on ModelScope.cn to speed up the download process.

<center>

| **Model** | **Context Length** |                                                                 **Download**                                                                  |
| :-------: | :----------------: | :-------------------------------------------------------------------------------------------------------------------------------------------: |
|  Ling-1T  | 32K -> 128K (YaRN) | [🤗 HuggingFace](https://huggingface.co/inclusionAI/Ling-1T) &nbsp;&nbsp; [🤖 ModelScope](https://www.modelscope.cn/models/inclusionAI/Ling-1T) |

</center>

Note: If you are interested in previous version, please visit the past model collections in [Huggingface](https://huggingface.co/inclusionAI) or [ModelScope](https://modelscope.cn/organization/inclusionAI).


## Quickstart

### 🚀 Try Online

You can experience Ling-1T online at: [ZenMux](https://zenmux.ai/inclusionai/ling-1t?utm_source=hf_inclusionAI)

### 🔌 API Usage

You can also use Ling-1T through API calls:

```python   
from openai import OpenAI

# 1. Initialize the OpenAI client
client = OpenAI(
    # 2. Point the base URL to the ZenMux endpoint
    base_url="https://zenmux.ai/api/v1",
    # 3. Replace with the API Key from your ZenMux user console
    api_key="<your ZENMUX_API_KEY>",
)

# 4. Make a request
completion = client.chat.completions.create(
    # 5. Specify the model to use in the format "provider/model-name"
    model="inclusionai/ling-1t",
    messages=[
        {
            "role": "user",
            "content": "What is the meaning of life?"
        }
    ]
)

print(completion.choices[0].message.content)
```

## Deployment

### SGLang

#### Environment Preparation

We will later submit our model to the SGLang official release. Now we can prepare the environment by following these steps:
```shell
pip3 install -U sglang sgl-kernel
```

#### Run Inference

Both BF16 and FP8 models are supported by SGLang now. It depends on the dtype of the model in ${MODEL_PATH}.

Here is the example to run Ling-1T with multiple GPU nodes, where the master node IP is ${MASTER_IP} and server port is ${PORT}:

- Start server:
```bash
# Node 0:
python -m sglang.launch_server --model-path $MODEL_PATH --tp-size 8 --pp-size 4 --dp-size 1 --trust-remote-code --dist-init-addr $MASTER_IP:2345 --port $PORT --nnodes 4 --node-rank 0 

# Node 1:
python -m sglang.launch_server --model-path $MODEL_PATH --tp-size 8 --pp-size 4 --dp-size 1 --trust-remote-code --dist-init-addr $MASTER_IP:2345 --port $PORT --nnodes 4 --node-rank 1 

# Node 2:
python -m sglang.launch_server --model-path $MODEL_PATH --tp-size 8 --pp-size 4 --dp-size 1 --trust-remote-code --dist-init-addr $MASTER_IP:2345 --port $PORT --nnodes 4 --node-rank 2 

# Node 3:
python -m sglang.launch_server --model-path $MODEL_PATH --tp-size 8 --pp-size 4 --dp-size 1 --trust-remote-code --dist-init-addr $MASTER_IP:2345 --port $PORT --nnodes 4 --node-rank 3

# This is only an example. Please adjust arguments according to your actual environment.
```

- Client:

```shell
curl -s http://${MASTER_IP}:${PORT}/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "auto", "messages": [{"role": "user", "content": "What is the capital of France?"}]}'
```

More usage can be found [here](https://docs.sglang.ai/basic_usage/send_request.html)

### vLLM

#### Environment Preparation

```bash
pip install vllm==0.11.0
```

#### Run Inference:

Here is the example to deploy the model with multiple GPU nodes, where the master node IP is ${MASTER_IP}, server port is ${PORT} and the path of model is ${MODEL_PATH}:

```bash
# step 1. start ray on all nodes

# step 2. start vllm server only on node 0:
vllm serve $MODEL_PATH --port $PORT --served-model-name my_model --trust-remote-code --tensor-parallel-size 32 --gpu-memory-utilization 0.85

# This is only an example, please adjust arguments according to your actual environment.
```

To handle long context in vLLM using YaRN, we need to follow these two steps:
1. Add a `rope_scaling` field to the model's `config.json` file, for example:
```json
{
  ...,
  "rope_scaling": {
    "factor": 4.0,
    "original_max_position_embeddings": 32768,
    "type": "yarn"
  }
}
```
2. Use an additional parameter `--max-model-len` to specify the desired maximum context length when starting the vLLM service.

For detailed guidance, please refer to the vLLM [`instructions`](https://docs.vllm.ai/en/latest/).



## Limitations & Future Plans

While **[Ling-1T](https://zenmux.ai/inclusionai/ling-1t?utm_source=hf_inclusionAI)** has made strong progress in efficient reasoning, cross-domain generalization, and training efficiency, several limitations remain:

* **GQA-based attention**: stable for long-context reasoning but relatively costly. Future versions will adopt **hybrid attention** to improve efficiency.
* **Limited agentic ability**: current model has room to grow in multi-turn interaction, long-term memory, and tool use.
* **Instruction and identity issues**: occasional deviations or role confusion may occur; future updates will enhance **alignment and consistency**.

The future versions of Ling-1T will continue to evolve in architecture, reasoning, and alignment, advancing the series toward more general intelligence.


## License

This code repository is licensed under [the MIT License](https://github.com/inclusionAI/Ling-V2/blob/main/LICENSE).

## FAQ
Recommended temperature? **0.7**  
Recommended top_p? **0.95**