EAGLE-3 Draft Model for Qwen3-VL-2B-Instruct

Model Overview

This repository contains an EAGLE-3 style draft model specifically trained to accelerate the inference of the Qwen3-VL-2B-Instruct large language model.

This is not a standalone model. It must be used in conjunction with its corresponding base model (Qwen3-VL-2B-Instruct) within a speculative decoding framework to achieve significant speedups in text generation.

Base Model: Qwen3-VL-2B-Instruct
Model Architecture: EAGLE-3 (Speculative Decoding Draft Model)
Primary Benefit: Accelerates text generation throughput by 1.5x to 2.5x without compromising the generation quality of the base model.

What is EAGLE?

EAGLE (Extrapolative A* Generative Language Engine) is an advanced speculative decoding method. It uses a small draft model to generate a sequence of draft tokens in parallel. These tokens are then verified by the larger, more powerful base model in a single forward pass. If the draft is accepted, the generation process advances multiple steps at once, leading to a substantial increase in speed.

This model serves as the "draft model" in this process. Its average acceptance length (acc_length) on standard benchmarks is approximately 1.87 tokens (with 4 draft tokens), meaning on average, it helps the base model advance nearly 2 tokens per verification step.

Performance

This model was evaluated on a diverse set of benchmarks. The acc_length (average number of accepted draft tokens) indicates the efficiency of the acceleration. A higher value is better.

Benchmark	`acc_length` (num_draft_tokens=4)	`acc_length` (num_draft_tokens=8)
humaneval	2.12	2.30
math500	2.11	2.27
ceval	1.86	1.97
cmmlu	1.84	1.97
gsm8k	1.83	1.88
mtbench	1.79	1.86
Average	~1.93	~2.04

These results demonstrate consistent and effective acceleration across various tasks, including coding, math, and general conversation.

Training Details

Training Framework: This model was trained using SpecForge, an open-source framework for speculative decoding research.
Training Data: The model was trained on the EagleChat dataset. Available on Hugging Face and ModelScope.
Training Duration: The model was trained for 2 epochs on 4x H20 GPUs, which took 27 hours and totaled 108 H20 GPU-hours.

适用于 Qwen3-VL-2B-Instruct 的 EAGLE-3 草稿模型

模型简介

本仓库包含一个 EAGLE-3 风格的草稿模型，专为加速 Qwen3-VL-2B-Instruct 大语言模型的推理而训练。

请注意：这是一个非独立模型。它必须与对应的基座模型 (Qwen3-VL-2B-Instruct) 在推测解码 (speculative decoding) 框架下配合使用，才能实现显著的文本生成加速效果。

基座模型: Qwen3-VL-2B-Instruct
模型架构: EAGLE-3 (推测解码草稿模型)
核心优势: 在不牺牲基座模型生成质量的前提下，将文本生成吞吐量提升 1.5 到 2.5 倍。

什么是 EAGLE？

EAGLE (Extrapolative A* Generative Language Engine) 是一种先进的推测解码方法。它利用一个轻量的草稿模型并行生成一系列草稿词元 (draft tokens)，然后由更大、更强的基座模型通过单次前向传播进行验证。如果草稿被接受，生成过程就能一次性前进多个步骤，从而实现显著的速度提升。

本模型在此过程中扮演“草稿模型”的角色。它在标准评测基准上的平均接受长度 (acc_length) 约为 1.87 个词元 (在草稿长度为4时)，这意味着在每次验证中，它平均能帮助基座模型推进接近 2 个词元。

性能表现

本模型在一系列多样化的评测基准上进行了评估。acc_length (平均接受的草稿词元数) 反映了加速的效率，数值越高越好。

评测基准 (Benchmark)	`acc_length` (num_draft_tokens=4)	`acc_length` (num_draft_tokens=8)
humaneval	2.12	2.30
math500	2.11	2.27
ceval	1.86	1.97
cmmlu	1.84	1.97
gsm8k	1.83	1.88
mtbench	1.79	1.86
平均值	~1.93	~2.04

这些结果表明，该模型在编码、数学和通用对话等不同任务上都能提供稳定且高效的加速效果。

训练细节

训练框架: 本模型使用开源推测解码研究框架 SpecForge 进行训练。
训练数据: 训练数据使用了 EagleChat 数据集。您可以在 Hugging Face 或 ModelScope 上获取该数据集。
训练耗时: 训练使用 4x H20 训练 2 轮，耗时 27 小时，共 108 H20 卡时。

Downloads last month: 37

Safetensors

Model size

0.1B params

Tensor type

I64

BF16

BOOL

Model tree for taobao-mnn/Qwen3-VL-2B-Instruct-Eagle3

Base model

Qwen/Qwen3-VL-2B-Instruct

Finetuned

(19)

this model

Collection including taobao-mnn/Qwen3-VL-2B-Instruct-Eagle3

Eagle3

Collection

Eagle3 weights trained by the MNN team. • 4 items • Updated 1 day ago • 1