Unable to load with GPU layers
#10
by
sambit-paul-poppulo
- opened
If I attempt to run this on CUDA device via llama.cpp:
export CUDA_VISIBLE_DEVICES=1
llama-embedding -m Qwen3-Embedding-4B-Q4_K_M.gguf -p "Who is it?<endoftext>" --verbose-prompt --embd-normalize 2 --gpu-layers 10 --pooling last
the embedding values returned are always a list of nan
.
sambit-paul-poppulo
changed discussion status to
closed
sambit-paul-poppulo
changed discussion status to
open
I tried in koboldcpp and it always was out of memory, even with 10gb free vram. It did load using CPU only but it was very slow on my 9 5900x + 64gb ddr4. I have 2 rtx 3090s on the system.