For more information (including how to compress models yourself), check out https://huggingface.co/DFloat11 and https://github.com/LeanModels/DFloat11

Some SDXL-based checkpoints are actually distributed in BF16 format, instead of the more commonly used FP16, which makes them possible to be losslessly compressed, so I decided to give it a try. Currently, only the unet component (the main diffusion model) is compressed, but in SDXL it is the largest part of the pipeline anyway.

Unfortunately, the stock DF11 codebase does not support loading of compressed torch.nn.Conv2D tensors, even though they compress just fine, so I chose to skip compressing them, in order to avoid the need for users to manually patch the DFloat11 codebase. This makes the final compressed model ~200 MB larger than the expected size, but that is the price to pay for compatibility. Nevertheless, the reduction in VRAM footprint is still rather significant, from 5.14 GB to 3.66 GB, which should make the unet fit in 6 GB GPUs (assuming it has BF16 support).

Feel free to request for other models for compression as well (for either the diffusers library, ComfyUI, or any other model), although models that use architectures which are unfamiliar to me might be more difficult.

How to Use

diffusers

  1. Install the DFloat11 pip package (installs the CUDA kernel automatically; requires a CUDA-compatible GPU and PyTorch installed):

    pip install dfloat11[cuda12]
    # or if you have CUDA version 11:
    # pip install dfloat11[cuda11]
    
  2. To use the DFloat11 model, run the following example code in Python:

    from diffusers import StableDiffusionXLPipeline
    from dfloat11 import DFloat11Model
    import torch
    
    pipe = StableDiffusionXLPipeline.from_single_file("https://huggingface.co/Laxhar/noobai-XL-1.1/resolve/main/NoobAI-XL-v1.1.safetensors", torch_dtype=torch.bfloat16)
    
    DFloat11Model.from_pretrained("mingyi456/noobai-XL-1.1-DF11", device = "cpu", bfloat16_model = pipe.unet)
    
    pipe.to("cuda")
    
    
    prompt = "masterpiece, best quality, newest, absurdres, highres, 1girl"
    
    negative_prompt = "worst quality, old, early, low quality, lowres, signature, username, logo, bad hands, mutated hands, mammal, anthro, furry, ambiguous form, feral, semi-anthro"
    
    image = pipe(
        prompt=prompt, 
        negative_prompt=negative_prompt,
        guidance_scale=5.0,
        num_inference_steps=35,
        width=832, 
        height=1216,
        generator=torch.Generator("cpu").manual_seed(0)
    ).images[0]
    
    image.save(r"NoobAI-XL-v1.1.png")
    

ComfyUI

Support for this model (and SDXL-based models) will be coming soon to my fork of the ComfyUI DF11 custom node. Stay tuned.

Update: I have uploaded it here: https://huggingface.co/mingyi456/noobai-XL-1.1-DF11-ComfyUI

Compression Details

This is the pattern_dict for compression:

pattern_dict = {
    r"time_embedding" : (
      "linear_1",
      "linear_2"
    ),
    r"add_embedding" : (
      "linear_1",
      "linear_2"
    ),
    
    
    r"down_blocks\.0\.resnets\.\d+" : (
        "time_emb_proj",
    ),

    r"down_blocks\.1\.attentions\.\d+\.transformer_blocks\.\d+" : (
        "attn1.to_q",
        "attn1.to_k",
        "attn1.to_v",
        "attn1.to_out.0",
        "attn2.to_q",
        "attn2.to_k",
        "attn2.to_v",
        "attn2.to_out.0",
        "ff.net.0.proj",
        "ff.net.2"
    ),
    r"down_blocks\.1\.resnets\.0" : (
        "time_emb_proj",
    ),
    r"down_blocks\.1\.resnets\.1" : (
        "time_emb_proj",
    ),
    r"down_blocks\.2\.attentions\.\d+\.transformer_blocks\.\d+" : (
        "attn1.to_q",
        "attn1.to_k",
        "attn1.to_v",
        "attn1.to_out.0",
        "attn2.to_q",
        "attn2.to_k",
        "attn2.to_v",
        "attn2.to_out.0",
        "ff.net.0.proj",
        "ff.net.2"
    ),
    r"down_blocks\.2\.resnets\.0" : (
        "time_emb_proj",
    ),
    r"down_blocks\.2\.resnets\.1" : (
        "time_emb_proj",
    ),
    
    
    r"up_blocks\.0\.attentions\.\d+\.transformer_blocks\.\d+" : (
        "attn1.to_q",
        "attn1.to_k",
        "attn1.to_v",
        "attn1.to_out.0",
        "attn2.to_q",
        "attn2.to_k",
        "attn2.to_v",
        "attn2.to_out.0",
        "ff.net.0.proj",
        "ff.net.2"
    ),
    r"up_blocks\.0\.resnets\.\d+" : (
        "time_emb_proj",
    ),
    r"up_blocks\.1\.attentions\.\d+\.transformer_blocks\.\d+" : (
        "attn1.to_q",
        "attn1.to_k",
        "attn1.to_v",
        "attn1.to_out.0",
        "attn2.to_q",
        "attn2.to_k",
        "attn2.to_v",
        "attn2.to_out.0",
        "ff.net.0.proj",
        "ff.net.2"
    ),
    r"up_blocks\.1\.resnets\.\d+" : (
        "time_emb_proj",
    ),
    r"up_blocks\.2\.resnets\.\d+" : (
        "time_emb_proj",
    ),
    
    
    r"mid_block\.attentions\.0\.transformer_blocks\.\d+" : (
        "attn1.to_q",
        "attn1.to_k",
        "attn1.to_v",
        "attn1.to_out.0",
        "attn2.to_q",
        "attn2.to_k",
        "attn2.to_v",
        "attn2.to_out.0",
        "ff.net.0.proj",
        "ff.net.2"
    ),
    r"mid_block\.resnets\.\d+" : (
        "time_emb_proj",
    )
}
Downloads last month
81
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for mingyi456/noobai-XL-1.1-DF11

Quantized
(3)
this model