need big quant like ud q8 xxl

by gopi87 - opened Aug 2

Discussion

gopi87

Aug 2

small quant are not performing well

https://huggingface.co/unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF/discussions/4

ubergarm

Owner Aug 2

small quant are not performing well

I read that thread and sounds like people are having trouble is many various quants and suggest the Qwen3-30B-A3B-Thinking-2507 is doing better?

My largest IQ5_K is just a few percent higher perplexity than the baseline BF16 so it is basically as good as a Q8_0 in theory (didn't test KLD etc blah blah).

Anyway, it could just be this small coder is not as good unfortunately, maybe try the Thinking version to see if it works better for your application?

gopi87

Aug 2

small quant are not performing well

I read that thread and sounds like people are having trouble is many various quants and suggest the Qwen3-30B-A3B-Thinking-2507 is doing better?

My largest IQ5_K is just a few percent higher perplexity than the baseline BF16 so it is basically as good as a Q8_0 in theory (didn't test KLD etc blah blah).

Anyway, it could just be this small coder is not as good unfortunately, maybe try the Thinking version to see if it works better for your application?

i am just stick with q8 looks good for me tbh small model not doing very well btw kindly check this model https://huggingface.co/MetaStoneTec/XBai-o4

ubergarm

Owner Aug 2

too many new models haha... still waiting on GGUF support for some of them too so hard to quant them all until support is added. do you know if XBai-o4 has a llama.cpp PR or maybe it's architechture is already compatible?

gopi87

Aug 3

too many new models haha... still waiting on GGUF support for some of them too so hard to quant them all until support is added. do you know if XBai-o4 has a llama.cpp PR or maybe it's architechture is already compatible?

its working in llama cpp but i cant see the thinking token in the interaface but looks cool the only problem is it its a dense model yep so many moes too

gopi87

Aug 3

too many new models haha... still waiting on GGUF support for some of them too so hard to quant them all until support is added. do you know if XBai-o4 has a llama.cpp PR or maybe it's architechture is already compatible?

its working in llama cpp but i cant see the thinking token in the interaface but looks cool the only problem is it its a dense model yep so many moes too

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment