Update README.md
Browse files
README.md
CHANGED
@@ -49,8 +49,9 @@ Jan-Nano-128k is fully supported by [Jan - beta build](https://www.jan.ai/docs/d
|
|
49 |
|
50 |
For additional tutorials and community guidance, visit our [Discussion Forums](https://huggingface.co/Menlo/Jan-nano-128k/discussions).
|
51 |
|
52 |
-
###
|
53 |
|
|
|
54 |
```bash
|
55 |
vllm serve Menlo/Jan-nano-128k \
|
56 |
--host 0.0.0.0 \
|
@@ -60,6 +61,10 @@ vllm serve Menlo/Jan-nano-128k \
|
|
60 |
--rope-scaling '{"rope_type":"yarn","factor":3.2,"original_max_position_embeddings":40960}' --max-model-len 131072
|
61 |
```
|
62 |
|
|
|
|
|
|
|
|
|
63 |
**Note:** The chat template is included in the tokenizer. For troubleshooting, download the [Non-think chat template](https://qwen.readthedocs.io/en/latest/_downloads/c101120b5bebcc2f12ec504fc93a965e/qwen3_nonthinking.jinja).
|
64 |
|
65 |
### Recommended Sampling Parameters
|
|
|
49 |
|
50 |
For additional tutorials and community guidance, visit our [Discussion Forums](https://huggingface.co/Menlo/Jan-nano-128k/discussions).
|
51 |
|
52 |
+
### Deployment
|
53 |
|
54 |
+
Deploy using VLLM:
|
55 |
```bash
|
56 |
vllm serve Menlo/Jan-nano-128k \
|
57 |
--host 0.0.0.0 \
|
|
|
61 |
--rope-scaling '{"rope_type":"yarn","factor":3.2,"original_max_position_embeddings":40960}' --max-model-len 131072
|
62 |
```
|
63 |
|
64 |
+
Or `llama-server` from `llama.cpp`:
|
65 |
+
```bash
|
66 |
+
llama-server ... --rope-scaling yarn --rope-scale 3.2 --yarn-orig-ctx 40960
|
67 |
+
```
|
68 |
**Note:** The chat template is included in the tokenizer. For troubleshooting, download the [Non-think chat template](https://qwen.readthedocs.io/en/latest/_downloads/c101120b5bebcc2f12ec504fc93a965e/qwen3_nonthinking.jinja).
|
69 |
|
70 |
### Recommended Sampling Parameters
|