Papers
arxiv:2509.19873

SpecMamba: Accelerating Mamba Inference on FPGA with Speculative Decoding

Published on Sep 24
Authors:
,
,
,
,
,
,

Abstract

SpecMamba, an FPGA-based accelerator, enhances speculative decoding for Mamba State Space Models, achieving significant speedup and energy efficiency improvements over GPU and previous FPGA solutions.

AI-generated summary

The growing demand for efficient long-sequence modeling on edge devices has propelled widespread adoption of State Space Models (SSMs) like Mamba, due to their superior computational efficiency and scalability. As its autoregressive generation process remains memory-bound, speculative decoding has been proposed that incorporates draft model generation and target model verification. However, directly applying speculative decoding to SSMs faces three key challenges: (1) hidden state backtracking difficulties, (2) tree-based parallel verification incompatibility, and (3) hardware workload mismatch. To address these challenges, we propose SpecMamba, the first FPGA-based accelerator for Mamba with speculative decoding, which features system, algorithm, and hardware co-design. At the system level, we present a memory-aware hybrid backtracking strategy to coordinate both models. At the algorithm level, we propose first-in-first-out (FIFO)-based tree verification with tiling to minimize memory access. At the hardware level, we customize a dataflow that computes linear layers in parallel and SSM layers in series to enable maximal overlapping. Implemented on AMD FPGA platforms (VHK158 and VCK190), SpecMamba achieves a 2.27x speedup over GPU baselines and a 2.85x improvement compared to prior FPGA solutions, while demonstrating 5.41x and 1.26x higher energy efficiency, respectively.

Community

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2509.19873 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2509.19873 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2509.19873 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.