scvreadgo commited on
Commit
c85cb7c
·
verified ·
1 Parent(s): 0653a15

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +87 -3
README.md CHANGED
@@ -1,3 +1,87 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <div align="center">
2
+
3
+ ## FaceCLIP
4
+ [**Zichuan Liu**](https://scholar.google.com/citations?user=-H18WY8AAAAJ)&nbsp;&nbsp;&nbsp;&nbsp;
5
+ [**Liming Jiang**](https://liming-jiang.com/)&nbsp;&nbsp;&nbsp;&nbsp;
6
+ [**Qing Yan**](https://scholar.google.com/citations?user=0TIYjPAAAAAJ)&nbsp;&nbsp;&nbsp;&nbsp;
7
+ [**Yumin Jia**](https://www.linkedin.com/in/yuminjia/)&nbsp;&nbsp;&nbsp;&nbsp;
8
+ [**Hao Kang**](https://scholar.google.com/citations?user=VeTCSyEAAAAJ)&nbsp;&nbsp;&nbsp;&nbsp;
9
+ [**Xin Lu**](https://scholar.google.com/citations?user=mFC0wp8AAAAJ)<br />
10
+ ByteDance Intelligent Creation<br />
11
+
12
+ <a href="TBD"><img src="https://img.shields.io/static/v1?label=Project&message=Page&color=blue&logo=github-pages"></a> &ensp;
13
+ <a href="TBD"><img src="https://img.shields.io/static/v1?label=ArXiv&message=Paper&color=darkred&logo=arxiv"></a> &ensp;
14
+ <a href="TBD"><img src="https://img.shields.io/static/v1?label=%F0%9F%A4%96%20Released&message=Models&color=green"></a> &ensp;
15
+ <a href="TBD"><img src="https://img.shields.io/static/v1?label=%F0%9F%A4%97%20Hugging%20Face&message=Demo&color=orange"></a> &ensp;
16
+
17
+ </div>
18
+
19
+ > **Abstract:** *Recent progress in text-to-image (T2I) diffusion models has greatly improved image quality and flexibility. However, a major challenge in personalized generation remains: preserving the subject’s identity (ID) while allowing diverse visual changes. We address this with a new framework for ID-preserving image generation. Instead of relying on adapter modules to inject identity features into pre-trained models, we propose a unified multi-modal encoding strategy that jointly captures identity and text information. Our method, called FaceCLIP, learns a shared embedding space for facial identity and textual semantics. Given a reference face image and a text prompt, FaceCLIP produces a joint representation that guides the generative model to synthesize images consistent with both the subject’s identity and the prompt. To train FaceCLIP, we introduce a multi-modal alignment loss that aligns features across face, text, and image domains. We then integrate FaceCLIP with existing UNet and Diffusion Transformer (DiT) architectures, forming a complete synthesis pipeline FaceCLIP-x. Compared to existing ID-preserving approaches, our method produces more photorealistic portraits with better identity retention and text alignment. Extensive experiments demonstrate that FaceCLIP-x outperforms prior methods in both qualitative and quantitative evaluations.*
20
+
21
+
22
+ ![teaser](./asset/readme/teaser.jpeg)
23
+ ![teaser](./asset/readme/demo.jpeg)
24
+ ![teaser](./asset/readme/arch.jpeg)
25
+
26
+
27
+ ## Model Zoo
28
+
29
+ | Version | Description |
30
+ |:-------------:|:-------------------------------------------------------------------------:|
31
+ | FaceCLIP-SDXL | SDXL base model trained with FaceCLIP-L-14 and FaceCLIP-bigG-14 encoders. |
32
+ | FaceT5-FLUX | FLUX.1-dev base model trained with FaceT5 encoder. |
33
+
34
+ ## Requirements and Installation
35
+
36
+ ### 1. Install ArcFace
37
+ ```python
38
+ from huggingface_hub import hf_hub_download
39
+
40
+ hf_hub_download(repo_id="FoivosPar/Arc2Face", filename="arc2face/config.json", local_dir="./ArcFace")
41
+ hf_hub_download(repo_id="FoivosPar/Arc2Face", filename="arc2face/diffusion_pytorch_model.safetensors", local_dir="./ArcFace")
42
+ hf_hub_download(repo_id="FoivosPar/Arc2Face", filename="encoder/config.json", local_dir="./ArcFace")
43
+ hf_hub_download(repo_id="FoivosPar/Arc2Face", filename="encoder/pytorch_model.bin", local_dir="./ArcFace")
44
+ hf_hub_download(repo_id="FoivosPar/Arc2Face", filename="arcface.onnx", local_dir="./ArcFace/antelopev2")
45
+ ```
46
+ ### 2. Install other dependencies
47
+ ```bash
48
+ bash setup.bash
49
+ ```
50
+
51
+ ## Inference
52
+
53
+ ### Local Inference Script
54
+
55
+ ```bash
56
+ python3 test.py --id_image ./assets/examples/man.jpg --prompt "A man, portrait, cinematic" --out_results_dir ./results
57
+ ```
58
+
59
+
60
+ ## 📜 Disclaimer and Licenses
61
+
62
+ The images used in this repository and related demos are sourced from consented subjects or generated by the models. These pictures are intended solely to showcase the capabilities of our research. If you have any concerns, please feel free to contact us, and we will promptly remove any inappropriate content.
63
+
64
+ The use of the released code, model, and demo must strictly adhere to the respective licenses. Our code is released under the [Apache License 2.0](./LICENSE), and our model is released under the [Creative Commons Attribution-NonCommercial 4.0 International Public License](https://huggingface.co/ByteDance/InfiniteYou/blob/main/LICENSE) for academic research purposes only. Any manual or automatic downloading of the face models from [InsightFace](https://github.com/deepinsight/insightface), the [FLUX.1-dev](https://huggingface.co/black-forest-labs/FLUX.1-dev) base model, LoRAs ([Realism](https://civitai.com/models/631986?modelVersionId=706528) and [Anti-blur](https://civitai.com/models/675581/anti-blur-flux-lora)), *etc.*, must follow their original licenses and be used only for academic research purposes.
65
+
66
+ This research aims to positively impact the field of Generative AI. Any usage of this method must be responsible and comply with local laws. The developers do not assume any responsibility for any potential misuse.
67
+
68
+ ## 🤗 Acknowledgments
69
+
70
+ We would like to express our gratitude to the authors of the following repositories, from which we referenced code, model or assets:
71
+ <br />https://github.com/foivospar/Arc2Face
72
+ <br />https://huggingface.co/black-forest-labs/FLUX.1-dev
73
+ <br />https://github.com/huggingface/diffusers
74
+ <br />https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0
75
+
76
+ ## 📖 Citation
77
+
78
+ If you find FaceCLIP useful for your research or applications, please cite our paper:
79
+
80
+ ```bibtex
81
+ @article{liu2025learning,
82
+ title={Learning Joint ID-Textual Representation for ID-Preserving Image Synthesis},
83
+ author={Liu, Zichuan and Jiang, Liming and Yan, Qing and Jia, Yumin and Kang, Hao and Lu, Xin},
84
+ journal={arXiv preprint arXiv:2504.14202},
85
+ year={2025}
86
+ }
87
+ ```