Every Activation Boosted: Scaling General Reasoner to 1 Trillion Open Language Foundation
Abstract
Ling 2.0, a reasoning-oriented language model series, achieves high efficiency and accuracy through a Mixture-of-Experts paradigm, sparse activation, and innovative training techniques.
We introduce Ling 2.0, a series reasoning-oriented language foundation built upon the principle that every activation boosts reasoning capability. Designed to scale from tens of billions to one trillion parameters under a unified Mixture-of-Experts (MoE) paradigm, Ling 2.0 emphasizes high sparsity, cross-scale consistency, and efficiency guided by empirical scaling laws. The series includes three non-thinking (instruct) models - Ling-mini-2.0, Ling-flash-2.0, and Ling-1T - ranging from 16B to 1T total parameters and achieving up to 7-fold active-compute efficiency compared with dense counterparts. Ling 2.0 integrates coordinated innovations across model architecture, pre-training, post-training, and infrastructure: a high-sparsity MoE with MTP for efficient reasoning, reasoning-oriented data and mid-training CoT activation, reinforcement-based fine-tuning (DFT, Evo-CoT), and full-scale FP8 training with fine-grained heterogeneous pipelines. At the trillion scale, Ling-1T establishes a new Pareto frontier of reasoning accuracy versus computational efficiency, demonstrating that sparse activation, when properly aligned with reasoning objectives, enables scalable and efficient intelligence. Collectively, Ling 2.0 provides a coherent, open, and efficient foundation for advancing future reasoning and thinking models, including the Ring series built upon the same base.
Community
Technical Report for Ling 2.0 series, including model architecture, pre-training, training infrastructure, post-training of the reflex-grade non-thinking version and comprehensive evaluations.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Scaling Latent Reasoning via Looped Language Models (2025)
- CoT Vectors: Transferring and Probing the Reasoning Mechanisms of LLMs (2025)
- Every Step Evolves: Scaling Reinforcement Learning for Trillion-Scale Thinking Model (2025)
- Every Attention Matters: An Efficient Hybrid Architecture for Long-Context Reasoning (2025)
- Revolutionizing Reinforcement Learning Framework for Diffusion Large Language Models (2025)
- Apriel-1.5-15b-Thinker (2025)
- Revealing the Power of Post-Training for Small Language Models via Knowledge Distillation (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 2
Datasets citing this paper 0
No dataset linking this paper