Menlo
/

Jan-nano-128k

Text Generation

Model card Files Files and versions Community

jan-hq commited on about 13 hours ago

Commit

089e37c

·

verified ·

1 Parent(s): 9a9be4b

Update README.md

Files changed (1) hide show

README.md +6 -1

README.md CHANGED Viewed

@@ -49,8 +49,9 @@ Jan-Nano-128k is fully supported by [Jan - beta build](https://www.jan.ai/docs/d
 For additional tutorials and community guidance, visit our [Discussion Forums](https://huggingface.co/Menlo/Jan-nano-128k/discussions).
-### VLLM Deployment
 ```bash
 vllm serve Menlo/Jan-nano-128k \
     --host 0.0.0.0 \
@@ -60,6 +61,10 @@ vllm serve Menlo/Jan-nano-128k \
     --rope-scaling '{"rope_type":"yarn","factor":3.2,"original_max_position_embeddings":40960}' --max-model-len 131072
 ```
 **Note:** The chat template is included in the tokenizer. For troubleshooting, download the [Non-think chat template](https://qwen.readthedocs.io/en/latest/_downloads/c101120b5bebcc2f12ec504fc93a965e/qwen3_nonthinking.jinja).
 ### Recommended Sampling Parameters

 For additional tutorials and community guidance, visit our [Discussion Forums](https://huggingface.co/Menlo/Jan-nano-128k/discussions).
+### Deployment
+Deploy using VLLM:
 ```bash
 vllm serve Menlo/Jan-nano-128k \
     --host 0.0.0.0 \
     --rope-scaling '{"rope_type":"yarn","factor":3.2,"original_max_position_embeddings":40960}' --max-model-len 131072
 ```
+Or `llama-server` from `llama.cpp`:
+```bash
+llama-server ... --rope-scaling yarn --rope-scale 3.2 --yarn-orig-ctx 40960
+```
 **Note:** The chat template is included in the tokenizer. For troubleshooting, download the [Non-think chat template](https://qwen.readthedocs.io/en/latest/_downloads/c101120b5bebcc2f12ec504fc93a965e/qwen3_nonthinking.jinja).
 ### Recommended Sampling Parameters