update

Files changed (5) hide show

THIRD_PARTY_NOTICES.md CHANGED Viewed

@@ -1,6 +1,6 @@
 # THIRD_PARTY_NOTICES
-This file lists third-party software contained in Kimi-K2 along with their licenses, in compliance with the redistribution clauses of those licenses.
 ---

 # THIRD_PARTY_NOTICES
+This file lists third-party software contained in Kimi-K2-Thinking along with their licenses, in compliance with the redistribution clauses of those licenses.
 ---

config.json CHANGED Viewed

@@ -23,7 +23,7 @@
   "do_sample": false,
   "early_stopping": false,
   "encoder_no_repeat_ngram_size": 0,
-  "eos_token_id": 163585,
   "ep_size": 1,
   "exponential_decay_length_penalty": null,
   "finetuning_task": null,
@@ -147,4 +147,4 @@
   "use_cache": true,
   "v_head_dim": 128,
   "vocab_size": 163840
-}

   "do_sample": false,
   "early_stopping": false,
   "encoder_no_repeat_ngram_size": 0,
+  "eos_token_id": 163586,
   "ep_size": 1,
   "exponential_decay_length_penalty": null,
   "finetuning_task": null,
   "use_cache": true,
   "v_head_dim": 128,
   "vocab_size": 163840
+}

docs/deploy_guidance.md CHANGED Viewed

@@ -6,7 +6,7 @@
 ## vLLM Deployment
-The smallest deployment unit for Kimi-K2 INT4 weights with 256k seqlen on mainstream H200 platform is a cluster with 8 GPUs with Tensor Parallel (TP).
 Running parameters for this environment are provided below. For other parallelism strategies, please refer to updates of official documents.
 Nightly version is needed for reasoning parser.
@@ -17,7 +17,6 @@ Here is a sample launch command with TP=8:
 ``` bash
-# node 0:
 vllm serve $MODEL_PATH \
   --served-model-name kimi-k2-thinking \
   --trust-remote-code \
@@ -47,7 +46,6 @@ Nightly version is needed for reasoning parser.
 Here is the simple example code to run TP8 on H200 in a sigle node:
 ``` bash
-# Node 0
 python -m sglang.launch_server --model-path $MODEL_PATH --tp 8 --trust-remote-code  --tool-call-parser kimi_k2 --reasoning_parser kimi_k2
 ```

 ## vLLM Deployment
+The smallest deployment unit for Kimi-K2-Thinking INT4 weights with 256k seqlen on mainstream H200 platform is a cluster with 8 GPUs with Tensor Parallel (TP).
 Running parameters for this environment are provided below. For other parallelism strategies, please refer to updates of official documents.
 Nightly version is needed for reasoning parser.
 ``` bash
 vllm serve $MODEL_PATH \
   --served-model-name kimi-k2-thinking \
   --trust-remote-code \
 Here is the simple example code to run TP8 on H200 in a sigle node:
 ``` bash
 python -m sglang.launch_server --model-path $MODEL_PATH --tp 8 --trust-remote-code  --tool-call-parser kimi_k2 --reasoning_parser kimi_k2
 ```

docs/tool_call_guidance.md CHANGED Viewed

@@ -49,7 +49,7 @@ The results obtained by the user after calling the tools should be added to `mes
 ```python
 import json
 from openai import OpenAI
-model_name='moonshotai/Kimi-K2-Instruct'
 client = OpenAI(base_url=endpoint,
                         api_key='xxx')
@@ -255,4 +255,4 @@ If you're using vLLM or SGLang and they are generating random tool-call IDs, upg
 #### Q3: My tool call id is correct, but I still get crashed in multiturn tool call.
-Please describe your situation in the [discussion](https://huggingface.co/moonshotai/Kimi-K2-Instruct-0905/discussions)

 ```python
 import json
 from openai import OpenAI
+model_name='moonshotai/Kimi-K2-Thinking'
 client = OpenAI(base_url=endpoint,
                         api_key='xxx')
 #### Q3: My tool call id is correct, but I still get crashed in multiturn tool call.
+Please describe your situation in the [discussion](https://huggingface.co/moonshotai/Kimi-K2-Thinking/discussions)

generation_config.json CHANGED Viewed

@@ -1,4 +1,4 @@
 {
-  "max_length": 131072,
   "eos_token_id": 163586
-}

 {
+  "max_length": 262144,
   "eos_token_id": 163586
+}