Lmxyy commited on
Commit
c5785cf
·
verified ·
1 Parent(s): b70dc6b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +11 -82
README.md CHANGED
@@ -1,97 +1,26 @@
1
  ---
 
 
 
 
 
 
 
2
  license: apache-2.0
 
3
  tags:
4
  - text-to-image
5
  - SVDQuant
6
  - FLUX.1-schnell
7
- - INT4
8
  - FLUX.1
9
  - Diffusion
10
  - Quantization
11
  - ICLR2025
12
- language:
13
- - en
14
- base_model:
15
- - black-forest-labs/FLUX.1-schnell
16
- base_model_relation: quantized
17
- pipeline_tag: text-to-image
18
- datasets:
19
- - mit-han-lab/svdquant-datasets
20
- library_name: diffusers
21
- ---
22
-
23
- <p align="center" style="border-radius: 10px">
24
- <img src="https://github.com/mit-han-lab/nunchaku/raw/refs/heads/main/assets/logo.svg" width="50%" alt="logo"/>
25
- </p>
26
- <h4 style="display: flex; justify-content: center; align-items: center; text-align: center;">Quantization Library:&nbsp;<a href='https://github.com/mit-han-lab/deepcompressor'>DeepCompressor</a> &ensp; Inference Engine:&nbsp;<a href='https://github.com/mit-han-lab/nunchaku'>Nunchaku</a>
27
- </h4>
28
-
29
- <div style="display: flex; justify-content: center; align-items: center; text-align: center;">
30
- <a href="https://arxiv.org/abs/2411.05007">[Paper]</a>&ensp;
31
- <a href='https://github.com/mit-han-lab/nunchaku'>[Code]</a>&ensp;
32
- <a href='https://svdquant.mit.edu'>[Demo]</a>&ensp;
33
- <a href='https://hanlab.mit.edu/projects/svdquant'>[Website]</a>&ensp;
34
- <a href='https://hanlab.mit.edu/blog/svdquant'>[Blog]</a>
35
- </div>
36
-
37
- ![teaser](https://github.com/mit-han-lab/nunchaku/raw/refs/heads/main/assets/teaser.jpg)
38
- SVDQuant is a post-training quantization technique for 4-bit weights and activations that well maintains visual fidelity. On 12B FLUX.1-dev, it achieves 3.6× memory reduction compared to the BF16 model. By eliminating CPU offloading, it offers 8.7× speedup over the 16-bit model when on a 16GB laptop 4090 GPU, 3× faster than the NF4 W4A16 baseline. On PixArt-∑, it demonstrates significantly superior visual quality over other W4A4 or even W4A8 baselines. "E2E" means the end-to-end latency including the text encoder and VAE decoder.
39
 
40
- `svdq-int4-flux.1-schnell` is an INT4-quantized version of [`FLUX.1-schnell`](https://huggingface.co/black-forest-labs/FLUX.1-schnell), which can generate an image based on a text description.
41
-
42
- ## Method
43
- #### Quantization Method -- SVDQuant
44
-
45
- ![intuition](https://github.com/mit-han-lab/nunchaku/raw/refs/heads/main/assets/intuition.gif)
46
- Overview of SVDQuant. Stage1: Originally, both the activation ***X*** and weights ***W*** contain outliers, making 4-bit quantization challenging. Stage 2: We migrate the outliers from activations to weights, resulting in the updated activation and weight. While the activation becomes easier to quantize, the weight now becomes more difficult. Stage 3: SVDQuant further decomposes the weight into a low-rank component and a residual with SVD. Thus, the quantization difficulty is alleviated by the low-rank branch, which runs at 16-bit precision.
47
-
48
- #### Nunchaku Engine Design
49
-
50
- ![engine](https://github.com/mit-han-lab/nunchaku/raw/refs/heads/main/assets/engine.jpg) (a) Naïvely running low-rank branch with rank 32 will introduce 57% latency overhead due to extra read of 16-bit inputs in *Down Projection* and extra write of 16-bit outputs in *Up Projection*. Nunchaku optimizes this overhead with kernel fusion. (b) *Down Projection* and *Quantize* kernels use the same input, while *Up Projection* and *4-Bit Compute* kernels share the same output. To reduce data movement overhead, we fuse the first two and the latter two kernels together.
51
-
52
- ## Model Description
53
-
54
- - **Developed by:** MIT, NVIDIA, CMU, Princeton, UC Berkeley, SJTU and Pika Labs
55
- - **Model type:** INT W4A4 model
56
- - **Model size:** 6.64GB
57
- - **Model resolution:** The number of pixels needs to be a multiple of 65,536.
58
- - **License:** Apache-2.0
59
-
60
- ## Usage
61
-
62
- ### Diffusers
63
-
64
- Please follow the instructions in [mit-han-lab/nunchaku](https://github.com/mit-han-lab/nunchaku) to set up the environment. Then you can run the model with
65
-
66
- ```python
67
- import torch
68
- from diffusers import FluxPipeline
69
-
70
- from nunchaku.models.transformer_flux import NunchakuFluxTransformer2dModel
71
-
72
- transformer = NunchakuFluxTransformer2dModel.from_pretrained("mit-han-lab/svdq-int4-flux.1-schnell")
73
- pipeline = FluxPipeline.from_pretrained(
74
- "black-forest-labs/FLUX.1-schnell", transformer=transformer, torch_dtype=torch.bfloat16
75
- ).to("cuda")
76
- image = pipeline(
77
- "A cat holding a sign that says hello world", width=1024, height=1024, num_inference_steps=4, guidance_scale=0
78
- ).images[0]
79
- image.save("flux.1-schnell-int4.png")
80
- ```
81
-
82
- ### Comfy UI
83
-
84
- ![comfyui](https://github.com/mit-han-lab/nunchaku/blob/main/assets/comfyui.jpg?raw=true)
85
- Please check [comfyui/README.md](comfyui/README.md) for the usage.
86
-
87
- ## Limitations
88
-
89
- - The model is only runnable on NVIDIA GPUs with architectures sm_86 (Ampere: RTX 3090, A6000), sm_89 (Ada: RTX 4090), and sm_80 (A100). See this [issue](https://github.com/mit-han-lab/nunchaku/issues/1) for more details.
90
- - You may observe some slight differences from the BF16 models in detail.
91
-
92
- ### Citation
93
 
94
- If you find this model useful or relevant to your research, please cite
95
 
96
  ```bibtex
97
  @inproceedings{
 
1
  ---
2
+ base_model: black-forest-labs/FLUX.1-schnell
3
+ base_model_relation: quantized
4
+ datasets:
5
+ - mit-han-lab/svdquant-datasets
6
+ language:
7
+ - en
8
+ library_name: diffusers
9
  license: apache-2.0
10
+ pipeline_tag: text-to-image
11
  tags:
12
  - text-to-image
13
  - SVDQuant
14
  - FLUX.1-schnell
 
15
  - FLUX.1
16
  - Diffusion
17
  - Quantization
18
  - ICLR2025
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
19
 
20
+ ---
21
+ **This repository has been deprecated and will be hidden in December 2025. Please use https://huggingface.co/nunchaku-tech/nunchaku-flux.1-schnell.**
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
22
 
23
+ ## Citation
24
 
25
  ```bibtex
26
  @inproceedings{