UNet

A lightweight UNet with single-block levels and sliding window attention.

  • Pixel-space model in CIELAB color space
  • LAB input, RGB output
  • Decompose the input images into their frequency-domain components
  • Docling as text encoder
  • Token efficient visual text inputs
  • Variable head in the attention modules across the layers

Retrospection

Reconstruction quality, from good to worst:

  • U-Docling (this repo)
  • U-DAE
  • U-DAE-NLL
  • EQ-SAE-CIELAB
  • EQ-SAE-CIELAB-c8
  • VAE-f16-c4-kv
  • VAE-f16-c4
  • VAE-f16-c8

References

  • 2411.17459
  • 2503.11576
  • 2510.17800
  • 2510.18279
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support