Spaces:

yusufs
/

llama32-3b-instruct

Paused

yusufs commited on 20 days ago

Commit

13e81e6

verified ·

1 Parent(s): ae21665

Update Dockerfile

Files changed (1) hide show

Dockerfile CHANGED Viewed

@@ -61,8 +61,12 @@ RUN pip install uv setuptools
 # Install vLLM
 # RUN uv pip install --system vllm==0.10.0 torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu113
 # Downgrade triton because following error occured when using triton==3.3.1
 # https://github.com/vllm-project/vllm/issues/20259#issuecomment-3157159183
 # /usr/local/lib/python3.12/dist-packages/vllm/attention/ops/prefix_prefill.py:36:0: error: Failures have been detected while processing an MLIR pass pipeline
 # /usr/local/lib/python3.12/dist-packages/vllm/attention/ops/prefix_prefill.py:36:0: note: Pipeline failed while executing [`ConvertTritonGPUToLLVM` on 'builtin.module' operation]: reproducer generated at `std::errs, please share the reproducer above with Triton project.`
 # INFO:     10.16.9.222:28100 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
@@ -177,7 +181,7 @@ RUN pip install uv setuptools
 # INFO:     Waiting for application shutdown.
 # INFO:     Application shutdown complete.
 # INFO:     Finished server process [27]
-RUN uv pip install --system --index-strategy unsafe-best-match vllm==0.10.0 triton==3.2 --extra-index-url https://download.pytorch.org/whl/cu128
 # # Then, install xformers with the --no-build-isolation flag
 # RUN uv pip install --system \

 # Install vLLM
 # RUN uv pip install --system vllm==0.10.0 torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu113
+RUN uv pip install --system --index-strategy unsafe-best-match vllm==0.10.0 --extra-index-url https://download.pytorch.org/whl/cu128
 # Downgrade triton because following error occured when using triton==3.3.1
 # https://github.com/vllm-project/vllm/issues/20259#issuecomment-3157159183
+# https://github.com/vllm-project/vllm/issues/19203#issuecomment-2989796604
 # /usr/local/lib/python3.12/dist-packages/vllm/attention/ops/prefix_prefill.py:36:0: error: Failures have been detected while processing an MLIR pass pipeline
 # /usr/local/lib/python3.12/dist-packages/vllm/attention/ops/prefix_prefill.py:36:0: note: Pipeline failed while executing [`ConvertTritonGPUToLLVM` on 'builtin.module' operation]: reproducer generated at `std::errs, please share the reproducer above with Triton project.`
 # INFO:     10.16.9.222:28100 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
 # INFO:     Waiting for application shutdown.
 # INFO:     Application shutdown complete.
 # INFO:     Finished server process [27]
+RUN uv pip install --system --index-strategy unsafe-best-match triton==3.2 --extra-index-url https://download.pytorch.org/whl/cu128
 # # Then, install xformers with the --no-build-isolation flag
 # RUN uv pip install --system \