Draft Models
Collection
Tiny "draft" models for speculative decoding.
•
24 items
•
Updated
•
1
A 0.6B
parameter draft (speculative decoding) model for use with deepseek-ai/DeepSeek-V3-0324 and deepseek-ai/DeepSeek-V3.
See jukofyork/DeepSeek-V3-0324-CODER-DRAFT-0.6B-v1.0 for the non-GGUF version, and a detailed explanation of how the model was created.
I've only included the Q4_0
quant: DeepSeek-V3-0324-CODER-DRAFT-0.6B-Q4_0.gguf
as the 14 heads of this model doesn't allow for any of the other 4-bit quants to be made, and experimentation has shown using more or less than 4-bits for speculative decoding is a waste of time.
4-bit
Base model
Qwen/Qwen2.5-0.5B