thinkthinking commited on
Commit
d50bf9d
·
verified ·
1 Parent(s): c639d3d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +45 -15
README.md CHANGED
@@ -1,59 +1,90 @@
1
  ---
2
  license: mit
3
  base_model:
4
- - inclusionAI/Ling-mini-base-2.0-20T
5
  pipeline_tag: text-generation
6
  library_name: transformers
7
  ---
 
8
  # Ring-mini-2.0
9
 
10
  <p align="center">
11
  <img src="https://mdn.alipayobjects.com/huamei_qa8qxu/afts/img/A*4QxcQrBlTiAAAAAAQXAAAAgAemJ7AQ/original" width="100"/>
12
  <p>
13
-
14
- <p align="center">🤗 <a href="https://huggingface.co/inclusionAI">Hugging Face</a>&nbsp&nbsp | &nbsp&nbsp🤖 <a href="https://modelscope.cn/organization/inclusionAI">ModelScope</a></p>
15
 
16
  Today, we officially release Ring-mini-2.0 — a high-performance inference-oriented MoE model deeply optimized based on the Ling 2.0 architecture. With only 16B total parameters and 1.4B activated parameters, it achieves comprehensive reasoning capabilities comparable to dense models below the 10B scale. It excels particularly in logical reasoning, code generation, and mathematical tasks, while supporting 128K long-context processing and 300+ tokens/s high-speed generation.
17
 
18
  ## Enhanced Reasoning: Joint Training with SFT + RLVR + RLHF
 
19
  Built upon Ling-mini-2.0-base, Ring-mini-2.0 undergoes further training with Long-CoT SFT, more stable and continuous RLVR, and RLHF joint optimization, significantly improving the stability and generalization of complex reasoning. On multiple challenging benchmarks (LiveCodeBench, AIME 2025, GPQA, ARC-AGI-v1, etc.), it outperforms dense models below 10B and even rivals larger MoE models (e.g., gpt-oss-20B-medium) with comparable output lengths, particularly excelling in logical reasoning.
20
 
21
  <p align="center">
22
  <img src="https://mdn.alipayobjects.com/huamei_d2byvp/afts/img/O2YKQqkdEvAAAAAASzAAAAgADod9AQFr/original" width="1000"/>
23
 
24
  <p>
25
-
26
  ## High Sparsity, High-Speed Generation
27
  Inheriting the efficient MoE design of the Ling 2.0 series, Ring-mini-2.0 activates only 1.4B parameters and achieves performance equivalent to 7–8B dense models through architectural optimizations such as 1/32 expert activation ratio and MTP layers. Thanks to its low activation and high sparsity design, Ring-mini-2.0 delivers a throughput of 300+ tokens/s when deployed on H20. With Expert Dual Streaming inference optimization, this can be further boosted to 500+ tokens/s, significantly reducing inference costs for high-concurrency scenarios involving thinking models. Additionally, with YaRN extrapolation, it supports 128K long-context processing, achieving a relative speedup of up to 7x in long-output scenarios.
28
  <p align="center">
29
  <img src="https://mdn.alipayobjects.com/huamei_d2byvp/afts/img/gjJKSpFVphEAAAAAgdAAAAgADod9AQFr/original" width="1000"/>
30
  <p>
31
-
32
  <p align="center">
33
  <img src="https://mdn.alipayobjects.com/huamei_d2byvp/afts/img/o-vGQadCF_4AAAAAgLAAAAgADod9AQFr/original" width="1000"/>
34
  <p>
35
 
36
-
37
  ## Model Downloads
38
 
39
  <div align="center">
40
 
41
- | **Model** | **#Total Params** | **#Activated Params** | **Context Length** | **Download** |
42
- | :----------------: | :---------------: | :-------------------: | :----------------: | :----------: |
43
- | Ring-mini-2.0 | 16.8B | 1.4B | 128K | [🤗 HuggingFace](https://huggingface.co/inclusionAI/Ring-mini-2.0) <br>[🤖 Modelscope](https://modelscope.cn/models/inclusionAI/Ring-mini-2.0)|
 
44
  </div>
45
 
46
  ## Quickstart
47
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
48
  ### 🤗 Hugging Face Transformers
49
 
50
  Here is a code snippet to show you how to use the chat model with `transformers`:
51
 
52
  ```python
53
  from transformers import AutoModelForCausalLM, AutoTokenizer
54
-
55
  model_name = "inclusionAI/Ring-mini-2.0"
56
-
57
  model = AutoModelForCausalLM.from_pretrained(
58
  model_name,
59
  torch_dtype="auto",
@@ -61,7 +92,6 @@ model = AutoModelForCausalLM.from_pretrained(
61
  trust_remote_code=True
62
  )
63
  tokenizer = AutoTokenizer.from_pretrained(model_name)
64
-
65
  prompt = "Give me a short introduction to large language models."
66
  messages = [
67
  {"role": "system", "content": "You are Ring, an assistant created by inclusionAI"},
@@ -74,7 +104,6 @@ text = tokenizer.apply_chat_template(
74
  enable_thinking=True
75
  )
76
  model_inputs = tokenizer([text], return_tensors="pt", return_token_type_ids=False).to(model.device)
77
-
78
  generated_ids = model.generate(
79
  **model_inputs,
80
  max_new_tokens=8192
@@ -82,12 +111,13 @@ generated_ids = model.generate(
82
  generated_ids = [
83
  output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
84
  ]
85
-
86
  response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
87
  ```
88
 
89
  ## License
 
90
  This code repository is licensed under [the MIT License](https://huggingface.co/inclusionAI/Ring-mini-2.0/blob/main/LICENSE).
91
 
92
  ## Citation
93
- TODO
 
 
1
  ---
2
  license: mit
3
  base_model:
4
+ - inclusionAI/Ling-mini-base-2.0-20T
5
  pipeline_tag: text-generation
6
  library_name: transformers
7
  ---
8
+
9
  # Ring-mini-2.0
10
 
11
  <p align="center">
12
  <img src="https://mdn.alipayobjects.com/huamei_qa8qxu/afts/img/A*4QxcQrBlTiAAAAAAQXAAAAgAemJ7AQ/original" width="100"/>
13
  <p>
14
+ <p align="center">🤗 <a href="https://huggingface.co/inclusionAI">Hugging Face</a>&nbsp&nbsp | &nbsp&nbsp🤖 <a href="https://modelscope.cn/organization/inclusionAI">ModelScope</a>&nbsp&nbsp | &nbsp&nbsp🚀 <a href="https://zenmux.ai/inclusionai/ring-mini-2.0?utm_source=hf_inclusionAI">Experience Now</a></p>
 
15
 
16
  Today, we officially release Ring-mini-2.0 — a high-performance inference-oriented MoE model deeply optimized based on the Ling 2.0 architecture. With only 16B total parameters and 1.4B activated parameters, it achieves comprehensive reasoning capabilities comparable to dense models below the 10B scale. It excels particularly in logical reasoning, code generation, and mathematical tasks, while supporting 128K long-context processing and 300+ tokens/s high-speed generation.
17
 
18
  ## Enhanced Reasoning: Joint Training with SFT + RLVR + RLHF
19
+
20
  Built upon Ling-mini-2.0-base, Ring-mini-2.0 undergoes further training with Long-CoT SFT, more stable and continuous RLVR, and RLHF joint optimization, significantly improving the stability and generalization of complex reasoning. On multiple challenging benchmarks (LiveCodeBench, AIME 2025, GPQA, ARC-AGI-v1, etc.), it outperforms dense models below 10B and even rivals larger MoE models (e.g., gpt-oss-20B-medium) with comparable output lengths, particularly excelling in logical reasoning.
21
 
22
  <p align="center">
23
  <img src="https://mdn.alipayobjects.com/huamei_d2byvp/afts/img/O2YKQqkdEvAAAAAASzAAAAgADod9AQFr/original" width="1000"/>
24
 
25
  <p>
 
26
  ## High Sparsity, High-Speed Generation
27
  Inheriting the efficient MoE design of the Ling 2.0 series, Ring-mini-2.0 activates only 1.4B parameters and achieves performance equivalent to 7–8B dense models through architectural optimizations such as 1/32 expert activation ratio and MTP layers. Thanks to its low activation and high sparsity design, Ring-mini-2.0 delivers a throughput of 300+ tokens/s when deployed on H20. With Expert Dual Streaming inference optimization, this can be further boosted to 500+ tokens/s, significantly reducing inference costs for high-concurrency scenarios involving thinking models. Additionally, with YaRN extrapolation, it supports 128K long-context processing, achieving a relative speedup of up to 7x in long-output scenarios.
28
  <p align="center">
29
  <img src="https://mdn.alipayobjects.com/huamei_d2byvp/afts/img/gjJKSpFVphEAAAAAgdAAAAgADod9AQFr/original" width="1000"/>
30
  <p>
 
31
  <p align="center">
32
  <img src="https://mdn.alipayobjects.com/huamei_d2byvp/afts/img/o-vGQadCF_4AAAAAgLAAAAgADod9AQFr/original" width="1000"/>
33
  <p>
34
 
 
35
  ## Model Downloads
36
 
37
  <div align="center">
38
 
39
+ | **Model** | **#Total Params** | **#Activated Params** | **Context Length** | **Download** |
40
+ | :-----------: | :---------------: | :-------------------: | :----------------: | :--------------------------------------------------------------------------------------------------------------------------------------------: |
41
+ | Ring-mini-2.0 | 16.8B | 1.4B | 128K | [🤗 HuggingFace](https://huggingface.co/inclusionAI/Ring-mini-2.0) <br>[🤖 Modelscope](https://modelscope.cn/models/inclusionAI/Ring-mini-2.0) |
42
+
43
  </div>
44
 
45
  ## Quickstart
46
 
47
+ ### 🚀 Try Online
48
+
49
+ You can experience Ring-mini-2.0 online at: [ZenMux](https://zenmux.ai/inclusionai/ring-mini-2.0?utm_source=hf_inclusionAI)
50
+
51
+ ### 🔌 API Usage
52
+
53
+ You can also use Ring-mini-2.0 through API calls:
54
+
55
+ ```python
56
+ from openai import OpenAI
57
+
58
+ # 1. Initialize the OpenAI client
59
+ client = OpenAI(
60
+ # 2. Point the base URL to the ZenMux endpoint
61
+ base_url="https://zenmux.ai/api/v1",
62
+ # 3. Replace with the API Key from your ZenMux user console
63
+ api_key="<your ZENMUX_API_KEY>",
64
+ )
65
+
66
+ # 4. Make a request
67
+ completion = client.chat.completions.create(
68
+ # 5. Specify the model to use in the format "provider/model-name"
69
+ model="inclusionai/ring-mini-2.0",
70
+ messages=[
71
+ {
72
+ "role": "user",
73
+ "content": "What is the meaning of life?"
74
+ }
75
+ ]
76
+ )
77
+
78
+ print(completion.choices[0].message.content)
79
+ ```
80
+
81
  ### 🤗 Hugging Face Transformers
82
 
83
  Here is a code snippet to show you how to use the chat model with `transformers`:
84
 
85
  ```python
86
  from transformers import AutoModelForCausalLM, AutoTokenizer
 
87
  model_name = "inclusionAI/Ring-mini-2.0"
 
88
  model = AutoModelForCausalLM.from_pretrained(
89
  model_name,
90
  torch_dtype="auto",
 
92
  trust_remote_code=True
93
  )
94
  tokenizer = AutoTokenizer.from_pretrained(model_name)
 
95
  prompt = "Give me a short introduction to large language models."
96
  messages = [
97
  {"role": "system", "content": "You are Ring, an assistant created by inclusionAI"},
 
104
  enable_thinking=True
105
  )
106
  model_inputs = tokenizer([text], return_tensors="pt", return_token_type_ids=False).to(model.device)
 
107
  generated_ids = model.generate(
108
  **model_inputs,
109
  max_new_tokens=8192
 
111
  generated_ids = [
112
  output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
113
  ]
 
114
  response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
115
  ```
116
 
117
  ## License
118
+
119
  This code repository is licensed under [the MIT License](https://huggingface.co/inclusionAI/Ring-mini-2.0/blob/main/LICENSE).
120
 
121
  ## Citation
122
+
123
+ TODO