need big quant like ud q8 xxl
small quant are not performing well
https://huggingface.co/unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF/discussions/4
small quant are not performing well
I read that thread and sounds like people are having trouble is many various quants and suggest the Qwen3-30B-A3B-Thinking-2507 is doing better?
My largest IQ5_K is just a few percent higher perplexity than the baseline BF16 so it is basically as good as a Q8_0 in theory (didn't test KLD etc blah blah).
Anyway, it could just be this small coder is not as good unfortunately, maybe try the Thinking version to see if it works better for your application?
small quant are not performing well
I read that thread and sounds like people are having trouble is many various quants and suggest the Qwen3-30B-A3B-Thinking-2507 is doing better?
My largest IQ5_K is just a few percent higher perplexity than the baseline BF16 so it is basically as good as a Q8_0 in theory (didn't test KLD etc blah blah).
Anyway, it could just be this small coder is not as good unfortunately, maybe try the Thinking version to see if it works better for your application?
i am just stick with q8 looks good for me tbh small model not doing very well btw kindly check this model https://huggingface.co/MetaStoneTec/XBai-o4
too many new models haha... still waiting on GGUF support for some of them too so hard to quant them all until support is added. do you know if XBai-o4 has a llama.cpp PR or maybe it's architechture is already compatible?
too many new models haha... still waiting on GGUF support for some of them too so hard to quant them all until support is added. do you know if XBai-o4 has a llama.cpp PR or maybe it's architechture is already compatible?
its working in llama cpp but i cant see the thinking token in the interaface but looks cool the only problem is it its a dense model yep so many moes too
too many new models haha... still waiting on GGUF support for some of them too so hard to quant them all until support is added. do you know if XBai-o4 has a llama.cpp PR or maybe it's architechture is already compatible?
its working in llama cpp but i cant see the thinking token in the interaface but looks cool the only problem is it its a dense model yep so many moes too