Update README.md

a503fd9 verified 30 days ago

8.68 kB

	---
	license: other
	license_name: tencent-hunyuan-community
	license_link: LICENSE
	pipeline_tag: text-to-image
	library_name: transformers
	language:
	- zh
	- en
	tasks:
	- text-to-image-synthesis
	frameworks: PyTorch
	base_model:
	- tencent/HunyuanImage-3.0
	base_model_relation: quantized
	---
	==================================================================================

	本模型为 https://huggingface.co/tencent/HunyuanImage-3.0 模型的 qint4 量化版本，采用 https://github.com/huggingface/optimum-quanto 技术量化，采用非官方技术保存的权重文件。

	本量化模型目前在 H20 96GB 单卡上通过测试。模型加载方式，采用非官方代码，详见 load_quantized_model.py 代码，目前里面包含两种加载方式，供大家参考，欢迎大家相互交流、共同研究学习，谢谢！

	加载方式一：模型初始化加载需要 CPU 大约 160GB 左右，GPU 初始占用 50GB；推理开始后 CPU 占用降至 70GB 左右，GPU 占用约 55-60 GB。模型加载时会出现模型键值的警告信息，但不影响使用。

	加载方式二：模型初始化加载需要 CPU 大约 75GB，GPU 初始占用 50GB；推理开始后 CPU 保持 75GB 占用， GPU 占用约 55-60GB。模型加载时，由于提供了键值 map , 所以不会出现任何警告信息。

	两种方法推理时间大致相同，在 H20 上大约 12 分钟一张(9:16 / 16:9)。

	<img src="./example.jpg" alt="Example Generated Image" width="800">

	==================================================================================

	### HunyuanImage-3.0 是一个非常出色的全模态混合专家模型！以下介绍内容引用自官方原模型介绍页。本项目提供的模型和代码仅用于社区分享和技术研究学习使用，请遵守腾讯混元官方的 License 相关规定。

	==================================================================================

	<div align="center">

	<img src="./logo.png" alt="HunyuanImage-3.0 Logo" width="600">

	# 🎨 HunyuanImage-3.0: A Powerful Native Multimodal Model for Image Generation

	</div>


	<div align="center">
	<img src="./banner.png" alt="HunyuanImage-3.0 Banner" width="800">

	</div>

	<div align="center">
	<a href=https://hunyuan.tencent.com/image target="_blank"><img src=https://img.shields.io/badge/Official%20Site-333399.svg?logo=homepage height=22px></a>
	<a href=https://huggingface.co/tencent/HunyuanImage-3.0 target="_blank"><img src=https://img.shields.io/badge/%F0%9F%A4%97%20Models-d96902.svg height=22px></a>
	<a href=https://github.com/Tencent-Hunyuan/HunyuanImage-3.0 target="_blank"><img src= https://img.shields.io/badge/Page-bb8a2e.svg?logo=github height=22px></a>
	<a href=https://arxiv.org/pdf/2509.23951 target="_blank"><img src=https://img.shields.io/badge/Report-b5212f.svg?logo=arxiv height=22px></a>
	<a href=https://x.com/TencentHunyuan target="_blank"><img src=https://img.shields.io/badge/Hunyuan-black.svg?logo=x height=22px></a>
	<a href=https://docs.qq.com/doc/DUVVadmhCdG9qRXBU target="_blank"><img src=https://img.shields.io/badge/📚-PromptHandBook-blue.svg?logo=book height=22px></a>
	</div>


	<p align="center">
	👏 Join our <a href="./assets/WECHAT.md" target="_blank">WeChat</a> and <a href="https://discord.gg/ehjWMqF5wY">Discord</a> \|
	💻 <a href="https://hunyuan.tencent.com/modelSquare/home/play?modelId=289&from=/visual">Official website(官网) Try our model!</a>&nbsp&nbsp
	</p>

	## 🔥🔥🔥 News
	- September 28, 2025: 📖 HunyuanImage-3.0 Technical Report Released - Comprehensive technical documentation now available
	- September 28, 2025: 🚀 HunyuanImage-3.0 Open Source Release - Inference code and model weights publicly available


	## 🧩 Community Contributions

	If you develop/use HunyuanImage-3.0 in your projects, welcome to let us know.

	## 📑 Open-source Plan

	- HunyuanImage-3.0 (Image Generation Model)
	- [x] Inference
	- [x] HunyuanImage-3.0 Checkpoints
	- [ ] HunyuanImage-3.0-Instruct Checkpoints (with reasoning)
	- [ ] VLLM Support
	- [ ] Distilled Checkpoints
	- [ ] Image-to-Image Generation
	- [ ] Multi-turn Interaction


	## 🗂️ Contents
	- [🔥🔥🔥 News](#-news)
	- [🧩 Community Contributions](#-community-contributions)
	- [📑 Open-source Plan](#-open-source-plan)
	- [📖 Introduction](#-introduction)
	- [✨ Key Features](#-key-features)
	- [🛠️ Dependencies and Installation](#-dependencies-and-installation)
	- [💻 System Requirements](#-system-requirements)
	- [📦 Environment Setup](#-environment-setup)
	- [📥 Install Dependencies](#-install-dependencies)
	- [Performance Optimizations](#performance-optimizations)
	- [🚀 Usage](#-usage)
	- [🔥 Quick Start with Transformers](#-quick-start-with-transformers)
	- [🏠 Local Installation & Usage](#-local-installation--usage)
	- [🎨 Interactive Gradio Demo](#-interactive-gradio-demo)
	- [🧱 Models Cards](#-models-cards)
	- [📝 Prompt Guide](#-prompt-guide)
	- [Manually Writing Prompts](#manually-writing-prompts)
	- [System Prompt For Automatic Rewriting the Prompt](#system-prompt-for-automatic-rewriting-the-prompt)
	- [Advanced Tips](#advanced-tips)
	- [More Cases](#more-cases)
	- [📊 Evaluation](#-evaluation)
	- [📚 Citation](#-citation)
	- [🙏 Acknowledgements](#-acknowledgements)
	- [🌟🚀 Github Star History](#-github-star-history)

	---

	## 📖 Introduction

	HunyuanImage-3.0 is a groundbreaking native multimodal model that unifies multimodal understanding and generation within an autoregressive framework. Our text-to-image module achieves performance comparable to or surpassing leading closed-source models.


	<div align="center">
	<img src="./framework.png" alt="HunyuanImage-3.0 Framework" width="90%">
	</div>

	## ✨ Key Features

	* 🧠 Unified Multimodal Architecture: Moving beyond the prevalent DiT-based architectures, HunyuanImage-3.0 employs a unified autoregressive framework. This design enables a more direct and integrated modeling of text and image modalities, leading to surprisingly effective and contextually rich image generation.

	* 🏆 The Largest Image Generation MoE Model: This is the largest open-source image generation Mixture of Experts (MoE) model to date. It features 64 experts and a total of 80 billion parameters, with 13 billion activated per token, significantly enhancing its capacity and performance.

	* 🎨 Superior Image Generation Performance: Through rigorous dataset curation and advanced reinforcement learning post-training, we've achieved an optimal balance between semantic accuracy and visual excellence. The model demonstrates exceptional prompt adherence while delivering photorealistic imagery with stunning aesthetic quality and fine-grained details.

	* 💭 Intelligent World-Knowledge Reasoning: The unified multimodal architecture endows HunyuanImage-3.0 with powerful reasoning capabilities. It leverages its extensive world knowledge to intelligently interpret user intent, automatically elaborating on sparse prompts with contextually appropriate details to produce superior, more complete visual outputs.

	## 📚 Citation

	If you find HunyuanImage-3.0 useful in your research, please cite our work:

	```bibtex
	@article{cao2025hunyuanimage,
	title={HunyuanImage 3.0 Technical Report},
	author={Cao, Siyu and Chen, Hangting and Chen, Peng and Cheng, Yiji and Cui, Yutao and Deng, Xinchi and Dong, Ying and Gong, Kipper and Gu, Tianpeng and Gu, Xiusen and others},
	journal={arXiv preprint arXiv:2509.23951},
	year={2025}
	}
	```

	## 🙏 Acknowledgements

	We extend our heartfelt gratitude to the following open-source projects and communities for their invaluable contributions:

	* 🤗 [Transformers](https://github.com/huggingface/transformers) - State-of-the-art NLP library
	* 🎨 [Diffusers](https://github.com/huggingface/diffusers) - Diffusion models library
	* 🌐 [HuggingFace](https://huggingface.co/) - AI model hub and community
	* ⚡ [FlashAttention](https://github.com/Dao-AILab/flash-attention) - Memory-efficient attention
	* 🚀 [FlashInfer](https://github.com/flashinfer-ai/flashinfer) - Optimized inference engine

	## 🌟🚀 Github Star History

	[![GitHub stars](https://img.shields.io/github/stars/Tencent-Hunyuan/HunyuanImage-3.0?style=social)](https://github.com/Tencent-Hunyuan/HunyuanImage-3.0)
	[![GitHub forks](https://img.shields.io/github/forks/Tencent-Hunyuan/HunyuanImage-3.0?style=social)](https://github.com/Tencent-Hunyuan/HunyuanImage-3.0)


	[![Star History Chart](https://api.star-history.com/svg?repos=Tencent-Hunyuan/HunyuanImage-3.0&type=Date)](https://www.star-history.com/#Tencent-Hunyuan/HunyuanImage-3.0&Date)