File size: 8,682 Bytes
9f24b15
 
 
 
 
 
 
a503fd9
 
9f24b15
a503fd9
9f24b15
 
a503fd9
9f24b15
 
24b11f3
9f24b15
a503fd9
9f24b15
 
 
 
 
 
 
 
 
 
 
24b11f3
9f24b15
 
 
24b11f3
9f24b15
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
---
license: other
license_name: tencent-hunyuan-community
license_link: LICENSE
pipeline_tag: text-to-image
library_name: transformers
language:
- zh
- en
tasks:
- text-to-image-synthesis
frameworks: PyTorch
base_model:
- tencent/HunyuanImage-3.0
base_model_relation: quantized
---
==================================================================================

本模型为 https://huggingface.co/tencent/HunyuanImage-3.0 模型的 qint4 量化版本,采用 https://github.com/huggingface/optimum-quanto 技术量化,采用非官方技术保存的权重文件。

本量化模型目前在 H20 96GB 单卡上通过测试。模型加载方式,采用非官方代码,详见 load_quantized_model.py 代码,目前里面包含两种加载方式,供大家参考,欢迎大家相互交流、共同研究学习,谢谢!

加载方式一:模型初始化加载需要 CPU 大约 160GB 左右,GPU 初始占用 50GB;推理开始后 CPU 占用降至 70GB 左右,GPU 占用约 55-60 GB。模型加载时会出现模型键值的警告信息,但不影响使用。

加载方式二:模型初始化加载需要 CPU 大约 75GB,GPU 初始占用 50GB;推理开始后 CPU 保持 75GB 占用, GPU 占用约 55-60GB。模型加载时,由于提供了键值 map , 所以不会出现任何警告信息。

两种方法推理时间大致相同,在 H20 上大约 12 分钟一张(9:16 / 16:9)。

<img src="./example.jpg" alt="Example Generated Image" width="800">

==================================================================================

### HunyuanImage-3.0 是一个非常出色的全模态混合专家模型!以下介绍内容引用自官方原模型介绍页。本项目提供的模型和代码仅用于社区分享和技术研究学习使用,请遵守腾讯混元官方的 License 相关规定。

==================================================================================

<div align="center">

<img src="./logo.png" alt="HunyuanImage-3.0 Logo" width="600">

# 🎨 HunyuanImage-3.0: A Powerful Native Multimodal Model for Image Generation

</div>


<div align="center">
<img src="./banner.png" alt="HunyuanImage-3.0 Banner" width="800">

</div>

<div align="center">
  <a href=https://hunyuan.tencent.com/image target="_blank"><img src=https://img.shields.io/badge/Official%20Site-333399.svg?logo=homepage height=22px></a>
  <a href=https://huggingface.co/tencent/HunyuanImage-3.0 target="_blank"><img src=https://img.shields.io/badge/%F0%9F%A4%97%20Models-d96902.svg height=22px></a>
  <a href=https://github.com/Tencent-Hunyuan/HunyuanImage-3.0 target="_blank"><img src= https://img.shields.io/badge/Page-bb8a2e.svg?logo=github height=22px></a>
  <a href=https://arxiv.org/pdf/2509.23951 target="_blank"><img src=https://img.shields.io/badge/Report-b5212f.svg?logo=arxiv height=22px></a>
  <a href=https://x.com/TencentHunyuan target="_blank"><img src=https://img.shields.io/badge/Hunyuan-black.svg?logo=x height=22px></a>
  <a href=https://docs.qq.com/doc/DUVVadmhCdG9qRXBU target="_blank"><img src=https://img.shields.io/badge/📚-PromptHandBook-blue.svg?logo=book height=22px></a>
</div>


<p align="center">
    👏 Join our <a href="./assets/WECHAT.md" target="_blank">WeChat</a> and <a href="https://discord.gg/ehjWMqF5wY">Discord</a> | 
💻 <a href="https://hunyuan.tencent.com/modelSquare/home/play?modelId=289&from=/visual">Official website(官网) Try our model!</a>&nbsp&nbsp
</p>

## 🔥🔥🔥 News
- **September 28, 2025**: 📖 **HunyuanImage-3.0 Technical Report Released** - Comprehensive technical documentation now available
- **September 28, 2025**: 🚀 **HunyuanImage-3.0 Open Source Release** - Inference code and model weights publicly available


## 🧩 Community Contributions

If you develop/use HunyuanImage-3.0 in your projects, welcome to let us know.

## 📑 Open-source Plan

- HunyuanImage-3.0 (Image Generation Model)
  - [x] Inference 
  - [x] HunyuanImage-3.0 Checkpoints
  - [ ] HunyuanImage-3.0-Instruct Checkpoints (with reasoning)
  - [ ] VLLM Support
  - [ ] Distilled Checkpoints
  - [ ] Image-to-Image Generation
  - [ ] Multi-turn Interaction


## 🗂️ Contents
- [🔥🔥🔥 News](#-news)
- [🧩 Community Contributions](#-community-contributions)
- [📑 Open-source Plan](#-open-source-plan)
- [📖 Introduction](#-introduction)
- [✨ Key Features](#-key-features)
- [🛠️ Dependencies and Installation](#-dependencies-and-installation)
  - [💻 System Requirements](#-system-requirements)
  - [📦 Environment Setup](#-environment-setup)
  - [📥 Install Dependencies](#-install-dependencies)
  - [Performance Optimizations](#performance-optimizations)
- [🚀 Usage](#-usage)
  - [🔥 Quick Start with Transformers](#-quick-start-with-transformers)
  - [🏠 Local Installation & Usage](#-local-installation--usage)
  - [🎨 Interactive Gradio Demo](#-interactive-gradio-demo)
- [🧱 Models Cards](#-models-cards)
- [📝 Prompt Guide](#-prompt-guide)
  - [Manually Writing Prompts](#manually-writing-prompts)
  - [System Prompt For Automatic Rewriting the Prompt](#system-prompt-for-automatic-rewriting-the-prompt)
  - [Advanced Tips](#advanced-tips)
  - [More Cases](#more-cases)
- [📊 Evaluation](#-evaluation)
- [📚 Citation](#-citation)
- [🙏 Acknowledgements](#-acknowledgements)
- [🌟🚀  Github Star History](#-github-star-history)

---

## 📖 Introduction

**HunyuanImage-3.0** is a groundbreaking native multimodal model that unifies multimodal understanding and generation within an autoregressive framework. Our text-to-image module achieves performance **comparable to or surpassing** leading closed-source models.


<div align="center">
  <img src="./framework.png" alt="HunyuanImage-3.0 Framework" width="90%">
</div>

## ✨ Key Features

* 🧠 **Unified Multimodal Architecture:** Moving beyond the prevalent DiT-based architectures, HunyuanImage-3.0 employs a unified autoregressive framework. This design enables a more direct and integrated modeling of text and image modalities, leading to surprisingly effective and contextually rich image generation.

* 🏆 **The Largest Image Generation MoE Model:** This is the largest open-source image generation Mixture of Experts (MoE) model to date. It features 64 experts and a total of 80 billion parameters, with 13 billion activated per token, significantly enhancing its capacity and performance.

* 🎨 **Superior Image Generation Performance:** Through rigorous dataset curation and advanced reinforcement learning post-training, we've achieved an optimal balance between semantic accuracy and visual excellence. The model demonstrates exceptional prompt adherence while delivering photorealistic imagery with stunning aesthetic quality and fine-grained details.

* 💭 **Intelligent World-Knowledge Reasoning:** The unified multimodal architecture endows HunyuanImage-3.0 with powerful reasoning capabilities. It leverages its extensive world knowledge to intelligently interpret user intent, automatically elaborating on sparse prompts with contextually appropriate details to produce superior, more complete visual outputs.

## 📚 Citation

If you find HunyuanImage-3.0 useful in your research, please cite our work:

```bibtex
@article{cao2025hunyuanimage,
  title={HunyuanImage 3.0 Technical Report},
  author={Cao, Siyu and Chen, Hangting and Chen, Peng and Cheng, Yiji and Cui, Yutao and Deng, Xinchi and Dong, Ying and Gong, Kipper and Gu, Tianpeng and Gu, Xiusen and others},
  journal={arXiv preprint arXiv:2509.23951},
  year={2025}
}
```

## 🙏 Acknowledgements

We extend our heartfelt gratitude to the following open-source projects and communities for their invaluable contributions:

* 🤗 [Transformers](https://github.com/huggingface/transformers) - State-of-the-art NLP library
* 🎨 [Diffusers](https://github.com/huggingface/diffusers) - Diffusion models library  
* 🌐 [HuggingFace](https://huggingface.co/) - AI model hub and community
* ⚡ [FlashAttention](https://github.com/Dao-AILab/flash-attention) - Memory-efficient attention
* 🚀 [FlashInfer](https://github.com/flashinfer-ai/flashinfer) - Optimized inference engine

## 🌟🚀 Github Star History

[![GitHub stars](https://img.shields.io/github/stars/Tencent-Hunyuan/HunyuanImage-3.0?style=social)](https://github.com/Tencent-Hunyuan/HunyuanImage-3.0)
[![GitHub forks](https://img.shields.io/github/forks/Tencent-Hunyuan/HunyuanImage-3.0?style=social)](https://github.com/Tencent-Hunyuan/HunyuanImage-3.0)


[![Star History Chart](https://api.star-history.com/svg?repos=Tencent-Hunyuan/HunyuanImage-3.0&type=Date)](https://www.star-history.com/#Tencent-Hunyuan/HunyuanImage-3.0&Date)