128GB Error
RuntimeError: Worker failed with error 'HIP out of memory. Tried to allocate 128.00 GiB. GPU 0 has a total capacity of 31.98 GiB of which 20.16 GiB is free. Of the allocated memory 10.74 GiB is allocated by PyTorch, and 695.69 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)', please check the stack trace above for the root cause
did you follow this guide https://docs.vllm.ai/projects/recipes/en/latest/Qwen/Qwen3-VL.html
"It's highly recommended to specify --limit-mm-per-prompt.video 0 if your inference server will only process image inputs since enabling video inputs consumes more memory reserved for long video embeddings."