UNet
A lightweight UNet with single-block levels and sliding window attention.
- Pixel-space model in CIELAB color space
- LAB input, RGB output
- Decompose the input images into their frequency-domain components
- Docling as text encoder
- Token efficient visual text inputs
- Variable head in the attention modules across the layers
Retrospection
Reconstruction quality, from good to worst:
- U-Docling (this repo)
- U-DAE
- U-DAE-NLL
- EQ-SAE-CIELAB
- EQ-SAE-CIELAB-c8
- VAE-f16-c4-kv
- VAE-f16-c4
- VAE-f16-c8
References
- 2411.17459
- 2503.11576
- 2510.17800
- 2510.18279
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support