Update README.md
Browse files
README.md
CHANGED
|
@@ -230,7 +230,7 @@ Here is the example to deploy the model with multiple GPU nodes, where the maste
|
|
| 230 |
# step 1. start ray on all nodes
|
| 231 |
|
| 232 |
# step 2. start vllm server only on node 0:
|
| 233 |
-
vllm serve $MODEL_PATH --port $PORT --served-model-name my_model --trust-remote-code --tensor-parallel-size
|
| 234 |
|
| 235 |
# This is only an example, please adjust arguments according to your actual environment.
|
| 236 |
```
|
|
|
|
| 230 |
# step 1. start ray on all nodes
|
| 231 |
|
| 232 |
# step 2. start vllm server only on node 0:
|
| 233 |
+
vllm serve $MODEL_PATH --port $PORT --served-model-name my_model --trust-remote-code --tensor-parallel-size 32 --gpu-memory-utilization 0.85
|
| 234 |
|
| 235 |
# This is only an example, please adjust arguments according to your actual environment.
|
| 236 |
```
|