a few questions

#3
by lmganon123 - opened

I started making versions that fit my PC and you probably know a lot more about all this by now. I just have a few questions you probably know better at this point:

  1. I have been using 16bf gguf for quanting but I suppose safetensor 16bf will be the same?
  2. Imatrix isn't interchangable for regular llamacpp and ikllama? And making it still needs 800GB's of ram? Or page swapping would work? I am making quants on my 7800X3D 192GB and I am pretty sure it page swaps + takes 16 hours (you can use it for your guides since you had only your threadripper estimate).

I have been using 16bf gguf for quanting but I suppose safetensor 16bf will be the same?

You will need a bf16 GGUF to do all quantizing with ik/llama.cpp. I describe a few methods to get either the og MLA style with attn_kv_b or the newer mainline style without attn_kv_b here: https://huggingface.co/ubergarm/DeepSeek-V3.1-GGUF/discussions/1#68a9cbd78d7a0473ba5b3c8e

Imatrix isn't interchangable for regular llamacpp and ikllama?

They are kind of interchangeable for non-MLA quants. But for MLA quants, you will need the matching kind with or without attn_kv_b. I have provided both style imatrix in this repo. The current one is for mainline style without attn_kv_b. If you want the old one, see the link in the above provided link (the first one i uploaded here).

Cheers and Good luck!

Sign up or log in to comment