|
|
--- |
|
|
license: other |
|
|
license_name: tencent-hunyuan-community |
|
|
license_link: LICENSE |
|
|
pipeline_tag: text-to-image |
|
|
library_name: transformers |
|
|
language: |
|
|
- zh |
|
|
- en |
|
|
tasks: |
|
|
- text-to-image-synthesis |
|
|
frameworks: PyTorch |
|
|
base_model: |
|
|
- tencent/HunyuanImage-3.0 |
|
|
base_model_relation: quantized |
|
|
--- |
|
|
================================================================================== |
|
|
|
|
|
ๆฌๆจกๅไธบ https://huggingface.co/tencent/HunyuanImage-3.0 ๆจกๅ็ qint4 ้ๅ็ๆฌ๏ผ้็จ https://github.com/huggingface/optimum-quanto ๆๆฏ้ๅ๏ผ้็จ้ๅฎๆนๆๆฏไฟๅญ็ๆ้ๆไปถใ |
|
|
|
|
|
ๆฌ้ๅๆจกๅ็ฎๅๅจ H20 96GB ๅๅกไธ้่ฟๆต่ฏใๆจกๅๅ ่ฝฝๆนๅผ๏ผ้็จ้ๅฎๆนไปฃ็ ๏ผ่ฏฆ่ง load_quantized_model.py ไปฃ็ ๏ผ็ฎๅ้้ขๅ
ๅซไธค็งๅ ่ฝฝๆนๅผ๏ผไพๅคงๅฎถๅ่๏ผๆฌข่ฟๅคงๅฎถ็ธไบไบคๆตใๅ
ฑๅ็ ็ฉถๅญฆไน ๏ผ่ฐข่ฐข๏ผ |
|
|
|
|
|
ๅ ่ฝฝๆนๅผไธ๏ผๆจกๅๅๅงๅๅ ่ฝฝ้่ฆ CPU ๅคง็บฆ 160GB ๅทฆๅณ๏ผGPU ๅๅงๅ ็จ 50GB๏ผๆจ็ๅผๅงๅ CPU ๅ ็จ้่ณ 70GB ๅทฆๅณ๏ผGPU ๅ ็จ็บฆ 55-60 GBใๆจกๅๅ ่ฝฝๆถไผๅบ็ฐๆจกๅ้ฎๅผ็่ญฆๅไฟกๆฏ๏ผไฝไธๅฝฑๅไฝฟ็จใ |
|
|
|
|
|
ๅ ่ฝฝๆนๅผไบ๏ผๆจกๅๅๅงๅๅ ่ฝฝ้่ฆ CPU ๅคง็บฆ 75GB๏ผGPU ๅๅงๅ ็จ 50GB๏ผๆจ็ๅผๅงๅ CPU ไฟๆ 75GB ๅ ็จ๏ผ GPU ๅ ็จ็บฆ 55-60GBใๆจกๅๅ ่ฝฝๆถ๏ผ็ฑไบๆไพไบ้ฎๅผ map , ๆไปฅไธไผๅบ็ฐไปปไฝ่ญฆๅไฟกๆฏใ |
|
|
|
|
|
ไธค็งๆนๆณๆจ็ๆถ้ดๅคง่ด็ธๅ๏ผๅจ H20 ไธๅคง็บฆ 12 ๅ้ไธๅผ (9:16 / 16:9)ใ |
|
|
|
|
|
<img src="./example.jpg" alt="Example Generated Image" width="800"> |
|
|
|
|
|
================================================================================== |
|
|
|
|
|
### HunyuanImage-3.0 ๆฏไธไธช้ๅธธๅบ่ฒ็ๅ
จๆจกๆๆททๅไธๅฎถๆจกๅ๏ผไปฅไธไป็ปๅ
ๅฎนๅผ็จ่ชๅฎๆนๅๆจกๅไป็ป้กตใๆฌ้กน็ฎๆไพ็ๆจกๅๅไปฃ็ ไป
็จไบ็คพๅบๅไบซๅๆๆฏ็ ็ฉถๅญฆไน ไฝฟ็จ๏ผ่ฏท้ตๅฎ่
พ่ฎฏๆททๅ
ๅฎๆน็ License ็ธๅ
ณ่งๅฎใ |
|
|
|
|
|
================================================================================== |
|
|
|
|
|
<div align="center"> |
|
|
|
|
|
<img src="./logo.png" alt="HunyuanImage-3.0 Logo" width="600"> |
|
|
|
|
|
# ๐จ HunyuanImage-3.0: A Powerful Native Multimodal Model for Image Generation |
|
|
|
|
|
</div> |
|
|
|
|
|
|
|
|
<div align="center"> |
|
|
<img src="./banner.png" alt="HunyuanImage-3.0 Banner" width="800"> |
|
|
|
|
|
</div> |
|
|
|
|
|
<div align="center"> |
|
|
<a href=https://hunyuan.tencent.com/image target="_blank"><img src=https://img.shields.io/badge/Official%20Site-333399.svg?logo=homepage height=22px></a> |
|
|
<a href=https://huggingface.co/tencent/HunyuanImage-3.0 target="_blank"><img src=https://img.shields.io/badge/%F0%9F%A4%97%20Models-d96902.svg height=22px></a> |
|
|
<a href=https://github.com/Tencent-Hunyuan/HunyuanImage-3.0 target="_blank"><img src= https://img.shields.io/badge/Page-bb8a2e.svg?logo=github height=22px></a> |
|
|
<a href=https://arxiv.org/pdf/2509.23951 target="_blank"><img src=https://img.shields.io/badge/Report-b5212f.svg?logo=arxiv height=22px></a> |
|
|
<a href=https://x.com/TencentHunyuan target="_blank"><img src=https://img.shields.io/badge/Hunyuan-black.svg?logo=x height=22px></a> |
|
|
<a href=https://docs.qq.com/doc/DUVVadmhCdG9qRXBU target="_blank"><img src=https://img.shields.io/badge/๐-PromptHandBook-blue.svg?logo=book height=22px></a> |
|
|
</div> |
|
|
|
|
|
|
|
|
<p align="center"> |
|
|
๐ Join our <a href="./assets/WECHAT.md" target="_blank">WeChat</a> and <a href="https://discord.gg/ehjWMqF5wY">Discord</a> | |
|
|
๐ป <a href="https://hunyuan.tencent.com/modelSquare/home/play?modelId=289&from=/visual">Official website(ๅฎ็ฝ) Try our model!</a>   |
|
|
</p> |
|
|
|
|
|
## ๐ฅ๐ฅ๐ฅ News |
|
|
- **September 28, 2025**: ๐ **HunyuanImage-3.0 Technical Report Released** - Comprehensive technical documentation now available |
|
|
- **September 28, 2025**: ๐ **HunyuanImage-3.0 Open Source Release** - Inference code and model weights publicly available |
|
|
|
|
|
|
|
|
## ๐งฉ Community Contributions |
|
|
|
|
|
If you develop/use HunyuanImage-3.0 in your projects, welcome to let us know. |
|
|
|
|
|
## ๐ Open-source Plan |
|
|
|
|
|
- HunyuanImage-3.0 (Image Generation Model) |
|
|
- [x] Inference |
|
|
- [x] HunyuanImage-3.0 Checkpoints |
|
|
- [ ] HunyuanImage-3.0-Instruct Checkpoints (with reasoning) |
|
|
- [ ] VLLM Support |
|
|
- [ ] Distilled Checkpoints |
|
|
- [ ] Image-to-Image Generation |
|
|
- [ ] Multi-turn Interaction |
|
|
|
|
|
|
|
|
## ๐๏ธ Contents |
|
|
- [๐ฅ๐ฅ๐ฅ News](#-news) |
|
|
- [๐งฉ Community Contributions](#-community-contributions) |
|
|
- [๐ Open-source Plan](#-open-source-plan) |
|
|
- [๐ Introduction](#-introduction) |
|
|
- [โจ Key Features](#-key-features) |
|
|
- [๐ ๏ธ Dependencies and Installation](#-dependencies-and-installation) |
|
|
- [๐ป System Requirements](#-system-requirements) |
|
|
- [๐ฆ Environment Setup](#-environment-setup) |
|
|
- [๐ฅ Install Dependencies](#-install-dependencies) |
|
|
- [Performance Optimizations](#performance-optimizations) |
|
|
- [๐ Usage](#-usage) |
|
|
- [๐ฅ Quick Start with Transformers](#-quick-start-with-transformers) |
|
|
- [๐ Local Installation & Usage](#-local-installation--usage) |
|
|
- [๐จ Interactive Gradio Demo](#-interactive-gradio-demo) |
|
|
- [๐งฑ Models Cards](#-models-cards) |
|
|
- [๐ Prompt Guide](#-prompt-guide) |
|
|
- [Manually Writing Prompts](#manually-writing-prompts) |
|
|
- [System Prompt For Automatic Rewriting the Prompt](#system-prompt-for-automatic-rewriting-the-prompt) |
|
|
- [Advanced Tips](#advanced-tips) |
|
|
- [More Cases](#more-cases) |
|
|
- [๐ Evaluation](#-evaluation) |
|
|
- [๐ Citation](#-citation) |
|
|
- [๐ Acknowledgements](#-acknowledgements) |
|
|
- [๐๐ Github Star History](#-github-star-history) |
|
|
|
|
|
--- |
|
|
|
|
|
## ๐ Introduction |
|
|
|
|
|
**HunyuanImage-3.0** is a groundbreaking native multimodal model that unifies multimodal understanding and generation within an autoregressive framework. Our text-to-image module achieves performance **comparable to or surpassing** leading closed-source models. |
|
|
|
|
|
|
|
|
<div align="center"> |
|
|
<img src="./framework.png" alt="HunyuanImage-3.0 Framework" width="90%"> |
|
|
</div> |
|
|
|
|
|
## โจ Key Features |
|
|
|
|
|
* ๐ง **Unified Multimodal Architecture:** Moving beyond the prevalent DiT-based architectures, HunyuanImage-3.0 employs a unified autoregressive framework. This design enables a more direct and integrated modeling of text and image modalities, leading to surprisingly effective and contextually rich image generation. |
|
|
|
|
|
* ๐ **The Largest Image Generation MoE Model:** This is the largest open-source image generation Mixture of Experts (MoE) model to date. It features 64 experts and a total of 80 billion parameters, with 13 billion activated per token, significantly enhancing its capacity and performance. |
|
|
|
|
|
* ๐จ **Superior Image Generation Performance:** Through rigorous dataset curation and advanced reinforcement learning post-training, we've achieved an optimal balance between semantic accuracy and visual excellence. The model demonstrates exceptional prompt adherence while delivering photorealistic imagery with stunning aesthetic quality and fine-grained details. |
|
|
|
|
|
* ๐ญ **Intelligent World-Knowledge Reasoning:** The unified multimodal architecture endows HunyuanImage-3.0 with powerful reasoning capabilities. It leverages its extensive world knowledge to intelligently interpret user intent, automatically elaborating on sparse prompts with contextually appropriate details to produce superior, more complete visual outputs. |
|
|
|
|
|
## ๐ Citation |
|
|
|
|
|
If you find HunyuanImage-3.0 useful in your research, please cite our work: |
|
|
|
|
|
```bibtex |
|
|
@article{cao2025hunyuanimage, |
|
|
title={HunyuanImage 3.0 Technical Report}, |
|
|
author={Cao, Siyu and Chen, Hangting and Chen, Peng and Cheng, Yiji and Cui, Yutao and Deng, Xinchi and Dong, Ying and Gong, Kipper and Gu, Tianpeng and Gu, Xiusen and others}, |
|
|
journal={arXiv preprint arXiv:2509.23951}, |
|
|
year={2025} |
|
|
} |
|
|
``` |
|
|
|
|
|
## ๐ Acknowledgements |
|
|
|
|
|
We extend our heartfelt gratitude to the following open-source projects and communities for their invaluable contributions: |
|
|
|
|
|
* ๐ค [Transformers](https://github.com/huggingface/transformers) - State-of-the-art NLP library |
|
|
* ๐จ [Diffusers](https://github.com/huggingface/diffusers) - Diffusion models library |
|
|
* ๐ [HuggingFace](https://huggingface.co/) - AI model hub and community |
|
|
* โก [FlashAttention](https://github.com/Dao-AILab/flash-attention) - Memory-efficient attention |
|
|
* ๐ [FlashInfer](https://github.com/flashinfer-ai/flashinfer) - Optimized inference engine |
|
|
|
|
|
## ๐๐ Github Star History |
|
|
|
|
|
[](https://github.com/Tencent-Hunyuan/HunyuanImage-3.0) |
|
|
[](https://github.com/Tencent-Hunyuan/HunyuanImage-3.0) |
|
|
|
|
|
|
|
|
[](https://www.star-history.com/#Tencent-Hunyuan/HunyuanImage-3.0&Date) |