liushaowei commited on
Commit
f1cdb45
·
1 Parent(s): 6dead1a
THIRD_PARTY_NOTICES.md CHANGED
@@ -1,6 +1,6 @@
1
  # THIRD_PARTY_NOTICES
2
 
3
- This file lists third-party software contained in Kimi-K2 along with their licenses, in compliance with the redistribution clauses of those licenses.
4
 
5
  ---
6
 
 
1
  # THIRD_PARTY_NOTICES
2
 
3
+ This file lists third-party software contained in Kimi-K2-Thinking along with their licenses, in compliance with the redistribution clauses of those licenses.
4
 
5
  ---
6
 
config.json CHANGED
@@ -23,7 +23,7 @@
23
  "do_sample": false,
24
  "early_stopping": false,
25
  "encoder_no_repeat_ngram_size": 0,
26
- "eos_token_id": 163585,
27
  "ep_size": 1,
28
  "exponential_decay_length_penalty": null,
29
  "finetuning_task": null,
@@ -147,4 +147,4 @@
147
  "use_cache": true,
148
  "v_head_dim": 128,
149
  "vocab_size": 163840
150
- }
 
23
  "do_sample": false,
24
  "early_stopping": false,
25
  "encoder_no_repeat_ngram_size": 0,
26
+ "eos_token_id": 163586,
27
  "ep_size": 1,
28
  "exponential_decay_length_penalty": null,
29
  "finetuning_task": null,
 
147
  "use_cache": true,
148
  "v_head_dim": 128,
149
  "vocab_size": 163840
150
+ }
docs/deploy_guidance.md CHANGED
@@ -6,7 +6,7 @@
6
 
7
  ## vLLM Deployment
8
 
9
- The smallest deployment unit for Kimi-K2 INT4 weights with 256k seqlen on mainstream H200 platform is a cluster with 8 GPUs with Tensor Parallel (TP).
10
  Running parameters for this environment are provided below. For other parallelism strategies, please refer to updates of official documents.
11
 
12
  Nightly version is needed for reasoning parser.
@@ -17,7 +17,6 @@ Here is a sample launch command with TP=8:
17
 
18
  ``` bash
19
 
20
- # node 0:
21
  vllm serve $MODEL_PATH \
22
  --served-model-name kimi-k2-thinking \
23
  --trust-remote-code \
@@ -47,7 +46,6 @@ Nightly version is needed for reasoning parser.
47
  Here is the simple example code to run TP8 on H200 in a sigle node:
48
 
49
  ``` bash
50
- # Node 0
51
  python -m sglang.launch_server --model-path $MODEL_PATH --tp 8 --trust-remote-code --tool-call-parser kimi_k2 --reasoning_parser kimi_k2
52
  ```
53
 
 
6
 
7
  ## vLLM Deployment
8
 
9
+ The smallest deployment unit for Kimi-K2-Thinking INT4 weights with 256k seqlen on mainstream H200 platform is a cluster with 8 GPUs with Tensor Parallel (TP).
10
  Running parameters for this environment are provided below. For other parallelism strategies, please refer to updates of official documents.
11
 
12
  Nightly version is needed for reasoning parser.
 
17
 
18
  ``` bash
19
 
 
20
  vllm serve $MODEL_PATH \
21
  --served-model-name kimi-k2-thinking \
22
  --trust-remote-code \
 
46
  Here is the simple example code to run TP8 on H200 in a sigle node:
47
 
48
  ``` bash
 
49
  python -m sglang.launch_server --model-path $MODEL_PATH --tp 8 --trust-remote-code --tool-call-parser kimi_k2 --reasoning_parser kimi_k2
50
  ```
51
 
docs/tool_call_guidance.md CHANGED
@@ -49,7 +49,7 @@ The results obtained by the user after calling the tools should be added to `mes
49
  ```python
50
  import json
51
  from openai import OpenAI
52
- model_name='moonshotai/Kimi-K2-Instruct'
53
  client = OpenAI(base_url=endpoint,
54
  api_key='xxx')
55
 
@@ -255,4 +255,4 @@ If you're using vLLM or SGLang and they are generating random tool-call IDs, upg
255
 
256
  #### Q3: My tool call id is correct, but I still get crashed in multiturn tool call.
257
 
258
- Please describe your situation in the [discussion](https://huggingface.co/moonshotai/Kimi-K2-Instruct-0905/discussions)
 
49
  ```python
50
  import json
51
  from openai import OpenAI
52
+ model_name='moonshotai/Kimi-K2-Thinking'
53
  client = OpenAI(base_url=endpoint,
54
  api_key='xxx')
55
 
 
255
 
256
  #### Q3: My tool call id is correct, but I still get crashed in multiturn tool call.
257
 
258
+ Please describe your situation in the [discussion](https://huggingface.co/moonshotai/Kimi-K2-Thinking/discussions)
generation_config.json CHANGED
@@ -1,4 +1,4 @@
1
  {
2
- "max_length": 131072,
3
  "eos_token_id": 163586
4
- }
 
1
  {
2
+ "max_length": 262144,
3
  "eos_token_id": 163586
4
+ }