mmbela/DeepSeek-R1-0528-optimized-for-512Gb-GGUF

This model is a merge of three differently quantized models from the unsloth/DeepSeek-R1-0528-GGUF repository. Everything except the routed experts comes from Q8_0, while most routed experts come from UD-Q4-XL and 6 more critical block routed experts originate from UD-Q5-XL.

After setting on Mac "sudo sysctl iogpu.wired_limit_mb=516096", my tests show it achieves maximum performance with a 16k context window under this size constraint. A 16k context window is often more than enough. Of course, those with more memory can opt for a larger one. It's clearly much smarter than homogeneous quantized versions of the same size.

mmbela
/

DeepSeek-R1-0528-optimized-for-512Gb-GGUF

Model tree for mmbela/DeepSeek-R1-0528-optimized-for-512Gb-GGUF