zelaki/SiT-ReDi-XL-2 · Hugging Face

Boosting Generative Image Modeling via Joint Image-Feature Synthesis

ReDi learns to generate coherent image-feature pairs from pure noise, significantly enhancing both generative quality and training efficiency.

This model uses SiT as the base model. We train for 4M steps with a batch size of 256 on ImageNet 256x256.

Generative performance on Imagenet Validation Set.

Model	FID	SFID	IS	Prec	Rec
SiT-XL/2 w/ ReDi	1.64	4.63	289.3	0.65	0.77