Unable to load with GPU layers

#10

by sambit-paul-poppulo - opened Jun 9

Jun 9

If I attempt to run this on CUDA device via llama.cpp:

export CUDA_VISIBLE_DEVICES=1
llama-embedding -m Qwen3-Embedding-4B-Q4_K_M.gguf -p "Who is it?<endoftext>" --verbose-prompt --embd-normalize 2 --gpu-layers 10 --pooling last

the embedding values returned are always a list of nan.

Is this expected behaviour?

sambit-paul-poppulo changed discussion status to closed Jun 9

sambit-paul-poppulo changed discussion status to open Jun 9

ThijsL202

Jun 9

I tried in koboldcpp and it always was out of memory, even with 10gb free vram. It did load using CPU only but it was very slow on my 9 5900x + 64gb ddr4. I have 2 rtx 3090s on the system.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment