128GB Error

by abiteddie - opened about 1 month ago

about 1 month ago

RuntimeError: Worker failed with error 'HIP out of memory. Tried to allocate 128.00 GiB. GPU 0 has a total capacity of 31.98 GiB of which 20.16 GiB is free. Of the allocated memory 10.74 GiB is allocated by PyTorch, and 695.69 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)', please check the stack trace above for the root cause

tclf90

QuantTrio org about 1 month ago

did you follow this guide https://docs.vllm.ai/projects/recipes/en/latest/Qwen/Qwen3-VL.html

"It's highly recommended to specify --limit-mm-per-prompt.video 0 if your inference server will only process image inputs since enabling video inputs consumes more memory reserved for long video embeddings."

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment