Boosting Generative Image Modeling via Joint Image-Feature Synthesis
Arxiv: https://arxiv.org/abs/2504.16064
ReDi learns to generate coherent image-feature pairs from pure noise, significantly enhancing both generative quality and training efficiency.
Model Description
This model uses SiT as the base model. We train for 4M steps with a batch size of 256 on ImageNet 256x256.
Metrics
Generative performance on Imagenet Validation Set.
Model | FID | SFID | IS | Prec | Rec |
---|---|---|---|---|---|
SiT-XL/2 w/ ReDi | 1.64 | 4.63 | 289.3 | 0.65 | 0.77 |
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support