This model is a merge of three differently quantized models from the unsloth/DeepSeek-R1-0528-GGUF repository. Everything except the routed experts comes from Q8_0, while most routed experts come from UD-Q4-XL and 6 more critical block routed experts originate from UD-Q5-XL.
After setting on Mac "sudo sysctl iogpu.wired_limit_mb=516096", my tests show it achieves maximum performance with a 16k context window under this size constraint. A 16k context window is often more than enough. Of course, those with more memory can opt for a larger one. It's clearly much smarter than homogeneous quantized versions of the same size.
- Downloads last month
- 5
Hardware compatibility
Log In
to view the estimation
8-bit
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for mmbela/DeepSeek-R1-0528-optimized-for-512Gb-GGUF
Base model
deepseek-ai/DeepSeek-R1-0528