This model is a merge of three differently quantized models from the unsloth/DeepSeek-R1-0528-GGUF repository. Everything except the routed experts comes from Q8_0, while most routed experts come from UD-Q4-XL and 6 more critical block routed experts originate from UD-Q5-XL.

After setting on Mac "sudo sysctl iogpu.wired_limit_mb=516096", my tests show it achieves maximum performance with a 16k context window under this size constraint. A 16k context window is often more than enough. Of course, those with more memory can opt for a larger one. It's clearly much smarter than homogeneous quantized versions of the same size.

Downloads last month
5
GGUF
Model size
671B params
Architecture
deepseek2
Hardware compatibility
Log In to view the estimation

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for mmbela/DeepSeek-R1-0528-optimized-for-512Gb-GGUF

Quantized
(37)
this model