GGUF
draft
speculative-decoding
conversational

image.webp

A 0.6B parameter draft (speculative decoding) model for use with deepseek-ai/DeepSeek-V3-0324 and deepseek-ai/DeepSeek-V3.

See jukofyork/DeepSeek-V3-0324-CODER-DRAFT-0.6B-v1.0 for the non-GGUF version, and a detailed explanation of how the model was created.


I've only included the Q4_0 quant: DeepSeek-V3-0324-CODER-DRAFT-0.6B-Q4_0.gguf

as the 14 heads of this model doesn't allow for any of the other 4-bit quants to be made, and experimentation has shown using more or less than 4-bits for speculative decoding is a waste of time.

Downloads last month
35
GGUF
Model size
590M params
Architecture
qwen2
Hardware compatibility
Log In to view the estimation

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for jukofyork/DeepSeek-V3-0324-CODER-DRAFT-0.6B-v1.0-GGUF

Base model

Qwen/Qwen2.5-0.5B
Quantized
(58)
this model

Datasets used to train jukofyork/DeepSeek-V3-0324-CODER-DRAFT-0.6B-v1.0-GGUF

Collection including jukofyork/DeepSeek-V3-0324-CODER-DRAFT-0.6B-v1.0-GGUF