Boosting Generative Image Modeling via Joint Image-Feature Synthesis

Arxiv: https://arxiv.org/abs/2504.16064

ReDi learns to generate coherent image-feature pairs from pure noise, significantly enhancing both generative quality and training efficiency.


Model Description

This model uses SiT as the base model. We train for 4M steps with a batch size of 256 on ImageNet 256x256.

Metrics

Generative performance on Imagenet Validation Set.

Model FID SFID IS Prec Rec
SiT-XL/2 w/ ReDi 1.64 4.63 289.3 0.65 0.77

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support