EAGLE-3 Draft Model for Qwen3-4B-Thinking-2507
Model Overview
This repository contains an EAGLE-3 style draft model specifically trained to accelerate the inference of the Qwen3-4B-Thinking-2507 large language model.
This is not a standalone model. It must be used in conjunction with its corresponding base model (Qwen3-4B-Thinking-2507) within a speculative decoding framework to achieve significant speedups in text generation.
- Base Model:
Qwen3-4B-Thinking-2507 - Model Architecture: EAGLE-3 (Speculative Decoding Draft Model)
- Primary Benefit: Accelerates text generation throughput by 1.5x to 2.5x without compromising the generation quality of the base model.
What is EAGLE?
EAGLE (Extrapolative A* Generative Language Engine) is an advanced speculative decoding method. It uses a small draft model to generate a sequence of draft tokens in parallel. These tokens are then verified by the larger, more powerful base model in a single forward pass. If the draft is accepted, the generation process advances multiple steps at once, leading to a substantial increase in speed.
Performance
This model was evaluated on a diverse set of benchmarks. The acc_length (average number of accepted draft tokens) indicates the efficiency of the acceleration. A higher value is better.
| Benchmark | acc_length (num_draft_tokens=4) |
acc_length (num_draft_tokens=8) |
|---|---|---|
| gsm8k | 2.07 | 2.07 |
| humaneval | 1.99 | 1.98 |
| math500 | 1.98 | 1.98 |
| ceval | 1.82 | 1.82 |
| cmmlu | 1.76 | 1.76 |
| mtbench | 1.71 | 1.71 |
| Average | ~1.89 | ~1.89 |
These results demonstrate consistent and effective acceleration across various tasks, including coding, math, and general conversation.
Training Details
- Training Framework: This model was trained using SpecForge, an open-source framework for speculative decoding research.
- Training Data: The model was trained on the EagleChat dataset. Available on Hugging Face and ModelScope.
适用于 Qwen3-4B-Thinking-2507 的 EAGLE-3 草稿模型
模型简介
本仓库包含一个 EAGLE-3 风格的草稿模型,专为加速 Qwen3-4B-Thinking-2507 大语言模型的推理而训练。
请注意:这是一个非独立模型。它必须与对应的基座模型 (Qwen3-4B-Thinking-2507) 在推测解码 (speculative decoding) 框架下配合使用,才能实现显著的文本生成加速效果。
- 基座模型:
Qwen3-4B-Thinking-2507 - 模型架构: EAGLE-3 (推测解码草稿模型)
- 核心优势: 在不牺牲基座模型生成质量的前提下,将文本生成吞吐量提升 1.5 到 2.5 倍。
什么是 EAGLE?
EAGLE (Extrapolative A* Generative Language Engine) 是一种先进的推测解码方法。它利用一个轻量的草稿模型并行生成一系列草稿词元 (draft tokens),然后由更大、更强的基座模型通过单次前向传播进行验证。如果草稿被接受,生成过程就能一次性前进多个步骤,从而实现显著的速度提升。
性能表现
本模型在一系列多样化的评测基准上进行了评估。acc_length (平均接受的草稿词元数) 反映了加速的效率,数值越高越好。
| 评测基准 (Benchmark) | acc_length (num_draft_tokens=4) |
acc_length (num_draft_tokens=8) |
|---|---|---|
| gsm8k | 2.07 | 2.07 |
| humaneval | 1.99 | 1.98 |
| math500 | 1.98 | 1.98 |
| ceval | 1.82 | 1.82 |
| cmmlu | 1.76 | 1.76 |
| mtbench | 1.71 | 1.71 |
| 平均值 | ~1.89 | ~1.89 |
这些结果表明,该模型在编码、数学和通用对话等不同任务上都能提供稳定且高效的加速效果。
训练细节
- 训练框架: 本模型使用开源推测解码研究框架 SpecForge 进行训练。
- 训练数据: 训练数据使用了 EagleChat 数据集。您可以在 Hugging Face 或 ModelScope 上获取该数据集。
- Downloads last month
- 16