Abstract
EVTAR is an end-to-end virtual try-on model that enhances accuracy by using reference images, simplifying the inference process and improving garment texture and detail preservation.
We propose EVTAR, an End-to-End Virtual Try-on model with Additional Reference, that directly fits the target garment onto the person image while incorporating reference images to enhance try-on accuracy. Most existing virtual try-on approaches rely on complex inputs such as agnostic person images, human pose, densepose, or body keypoints, making them labor-intensive and impractical for real-world applications. In contrast, EVTAR adopts a two-stage training strategy, enabling simple inference with only the source image and the target garment inputs. Our model generates try-on results without masks, densepose, or segmentation maps. Moreover, EVTAR leverages additional reference images of different individuals wearing the same clothes to preserve garment texture and fine-grained details better. This mechanism is analogous to how humans consider reference models when choosing outfits, thereby simulating a more realistic and high-quality dressing effect. We enrich the training data with supplementary references and unpaired person images to support these capabilities. We evaluate EVTAR on two widely used benchmarks and diverse tasks, and the results consistently validate the effectiveness of our approach.
Community
Code is available at: https://github.com/360CVGroup/EVTAR. Model is available at: https://huggingface.co/qihoo360/EVTAR.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Efficient Encoder-Free Pose Conditioning and Pose Control for Virtual Try-On (2025)
- ART-VITON: Measurement-Guided Latent Diffusion for Artifact-Free Virtual Try-On (2025)
- Once Is Enough: Lightweight DiT-Based Video Virtual Try-On via One-Time Garment Appearance Injection (2025)
- Teleportraits: Training-Free People Insertion into Any Scene (2025)
- PercHead: Perceptual Head Model for Single-Image 3D Head Reconstruction&Editing (2025)
- InfiniHuman: Infinite 3D Human Creation with Precise Control (2025)
- UP2You: Fast Reconstruction of Yourself from Unconstrained Photo Collections (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 1
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper