nightknocker
/

u-slider-text-encoder-docling-258m

Model card Files Files and versions

UNet

A lightweight UNet with single-block levels and sliding window attention.

Pixel-space model in CIELAB color space
LAB input, RGB output
Decompose the input images into their frequency-domain components
Docling as text encoder
Token efficient visual text inputs
Variable head in the attention modules across the layers

Retrospection

Reconstruction quality, from good to worst:

U-Docling (this repo)
U-DAE
U-DAE-NLL
EQ-SAE-CIELAB
EQ-SAE-CIELAB-c8
VAE-f16-c4-kv
VAE-f16-c4
VAE-f16-c8

References

2411.17459
2503.11576
2510.17800
2510.18279

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support