jan-hq commited on
Commit
089e37c
·
verified ·
1 Parent(s): 9a9be4b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -1
README.md CHANGED
@@ -49,8 +49,9 @@ Jan-Nano-128k is fully supported by [Jan - beta build](https://www.jan.ai/docs/d
49
 
50
  For additional tutorials and community guidance, visit our [Discussion Forums](https://huggingface.co/Menlo/Jan-nano-128k/discussions).
51
 
52
- ### VLLM Deployment
53
 
 
54
  ```bash
55
  vllm serve Menlo/Jan-nano-128k \
56
  --host 0.0.0.0 \
@@ -60,6 +61,10 @@ vllm serve Menlo/Jan-nano-128k \
60
  --rope-scaling '{"rope_type":"yarn","factor":3.2,"original_max_position_embeddings":40960}' --max-model-len 131072
61
  ```
62
 
 
 
 
 
63
  **Note:** The chat template is included in the tokenizer. For troubleshooting, download the [Non-think chat template](https://qwen.readthedocs.io/en/latest/_downloads/c101120b5bebcc2f12ec504fc93a965e/qwen3_nonthinking.jinja).
64
 
65
  ### Recommended Sampling Parameters
 
49
 
50
  For additional tutorials and community guidance, visit our [Discussion Forums](https://huggingface.co/Menlo/Jan-nano-128k/discussions).
51
 
52
+ ### Deployment
53
 
54
+ Deploy using VLLM:
55
  ```bash
56
  vllm serve Menlo/Jan-nano-128k \
57
  --host 0.0.0.0 \
 
61
  --rope-scaling '{"rope_type":"yarn","factor":3.2,"original_max_position_embeddings":40960}' --max-model-len 131072
62
  ```
63
 
64
+ Or `llama-server` from `llama.cpp`:
65
+ ```bash
66
+ llama-server ... --rope-scaling yarn --rope-scale 3.2 --yarn-orig-ctx 40960
67
+ ```
68
  **Note:** The chat template is included in the tokenizer. For troubleshooting, download the [Non-think chat template](https://qwen.readthedocs.io/en/latest/_downloads/c101120b5bebcc2f12ec504fc93a965e/qwen3_nonthinking.jinja).
69
 
70
  ### Recommended Sampling Parameters