Spaces:
Paused
Paused
fix(float16): Bfloat16 is only supported on GPUs with compute capability of at least 8.0. Your Tesla T4 GPU has compute capability 7.5. You can use float16 instead by explicitly setting the`dtype` flag in CLI, for example: --dtype=half.
Browse files
runner.sh
CHANGED
|
@@ -59,7 +59,7 @@ python -u /app/openai_compatible_api_server.py \
|
|
| 59 |
--port 7860 \
|
| 60 |
--max-num-batched-tokens 32768 \
|
| 61 |
--max-model-len 32768 \
|
| 62 |
-
--dtype
|
| 63 |
--enforce-eager \
|
| 64 |
--gpu-memory-utilization 0.9 \
|
| 65 |
--enable-prefix-caching \
|
|
|
|
| 59 |
--port 7860 \
|
| 60 |
--max-num-batched-tokens 32768 \
|
| 61 |
--max-model-len 32768 \
|
| 62 |
+
--dtype float16 \
|
| 63 |
--enforce-eager \
|
| 64 |
--gpu-memory-utilization 0.9 \
|
| 65 |
--enable-prefix-caching \
|