World's smallest working R1-0528 Quant!
Browse filesWith recent PRs from ik_llama.cpp, you can now run full DeepSeek-R1-0528
671B model in 128GB RAM + 24GB VRAM! What a time to be alive!
- README.md +21 -8
- images/kld-r1-0528-smol-bois.png +3 -0
README.md
CHANGED
@@ -45,14 +45,18 @@ So far these are my best recipes offering the lowest perplexity per GiB models s
|
|
45 |
- `Final estimate: PPL = 3.5069 +/- 0.01893`
|
46 |
- Fits 32k context in under 16GiB VRAM
|
47 |
- Fits 64k context in under 24GiB VRAM
|
48 |
-
* `DeepSeek-R1-0528-
|
49 |
-
- `Final estimate: PPL = 4.
|
|
|
|
|
|
|
50 |
- Fits 32k+ context in under 16GiB VRAM
|
51 |
- Should fit in 128GiB RAM + 24GB VRAM by offloading layers to GPU.
|
52 |
-
- *Don't use the old `IQ1_S_R4` if you need to offload to GPU!*
|
53 |
- "Only for the desperate."
|
54 |
- Technically "better" (lower) PPL than `Qwen3-235B-A22B-Q8_0 @ ~5.31` though you can't really make comparisons like this.
|
55 |
-
|
|
|
|
|
56 |
|
57 |
#### `IQ4_KS_R4` 4.701 BPW (368GiB)
|
58 |
Special mix `IQ5_KS_R4` `ffn_down` and `IQ4_KS_R4` `ffn_(up|gate)` routed experts. All other layers `q8_0` for CPU+GPU offload. For max speed on CPU *only* rigs use `--run-time-repack`.
|
@@ -307,9 +311,20 @@ custom=$(
|
|
307 |
|
308 |
</details>
|
309 |
|
310 |
-
#### `
|
|
|
|
|
|
|
|
|
|
|
311 |
|
312 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
313 |
|
314 |
<details>
|
315 |
|
@@ -354,8 +369,6 @@ llama_new_context_with_model: CUDA_Host compute buffer size = 78.01 MiB
|
|
354 |
|
355 |
</details>
|
356 |
|
357 |
-
*NOTE*: Probably don't use the similar sized repacked version `IQ1_S_R4` 1.664 BPW (131GiB) as it can't run on GPU so only if you are doing CPU only or know what you're doing specifically e.g. having over 128GB RAM.
|
358 |
-
|
359 |
<details>
|
360 |
|
361 |

|
|
|
45 |
- `Final estimate: PPL = 3.5069 +/- 0.01893`
|
46 |
- Fits 32k context in under 16GiB VRAM
|
47 |
- Fits 64k context in under 24GiB VRAM
|
48 |
+
* `DeepSeek-R1-0528-IQ1_S_R4` 131GiB
|
49 |
+
- `Final estimate: PPL = 4.8805 +/- 0.02876`
|
50 |
+
- The world's smallest working DeepSeek-R1-0528 Quant!
|
51 |
+
- Run on AM5 class gaming rig with 2x64GB DDR5 DIMM kit and single GPU!
|
52 |
+
- Support for this is bleeding edge you need [PR494](https://github.com/ikawrakow/ik_llama.cpp/pull/494)
|
53 |
- Fits 32k+ context in under 16GiB VRAM
|
54 |
- Should fit in 128GiB RAM + 24GB VRAM by offloading layers to GPU.
|
|
|
55 |
- "Only for the desperate."
|
56 |
- Technically "better" (lower) PPL than `Qwen3-235B-A22B-Q8_0 @ ~5.31` though you can't really make comparisons like this.
|
57 |
+
|
58 |
+
#### TODO
|
59 |
+
I might release my `iq2_kt` "QTIP/exl3/trellis" style quant, but it is rather experimental and the inferencing implementation needs more time to bake.
|
60 |
|
61 |
#### `IQ4_KS_R4` 4.701 BPW (368GiB)
|
62 |
Special mix `IQ5_KS_R4` `ffn_down` and `IQ4_KS_R4` `ffn_(up|gate)` routed experts. All other layers `q8_0` for CPU+GPU offload. For max speed on CPU *only* rigs use `--run-time-repack`.
|
|
|
311 |
|
312 |
</details>
|
313 |
|
314 |
+
#### `IQ1_S_R4` 130.203 GiB (1.664 BPW)
|
315 |
+
|
316 |
+
The world's smallest working DeepSeek-R1-0528 quant!
|
317 |
+
|
318 |
+

|
319 |
+
The Delta P numbers for average RMS, 99% percentile, and absolute max divergence from the baseline pure `Q8_0`. Lower is better.
|
320 |
|
321 |
+
If you can fit a larger model completely in RAM+VRAM I would recommend
|
322 |
+
that, but if you have 128GB RAM + 24GB VRAM then give this a try as it
|
323 |
+
is surprisingly usable despite heavy quantization.
|
324 |
+
|
325 |
+
Support for this is bleeding edge you need [PR494](https://github.com/ikawrakow/ik_llama.cpp/pull/494)!
|
326 |
+
|
327 |
+
Special mix `IQ1_M_R4` `ffn_down` and `IQ1_S_R4` `ffn_(up|gate)` routed experts. All other layers mostly `iq4_ks` for CPU+GPU offload. For max speed on CPU *only* rigs use `--run-time-repack` (only appleis to the `iq4_ks` tensors etc.).
|
328 |
|
329 |
<details>
|
330 |
|
|
|
369 |
|
370 |
</details>
|
371 |
|
|
|
|
|
372 |
<details>
|
373 |
|
374 |

|
images/kld-r1-0528-smol-bois.png
ADDED
![]() |
Git LFS Details
|