FreedomIntelligence
/

Janus-4o-7B

@@ -5,57 +5,47 @@ license_link: LICENSE
 pipeline_tag: any-to-any
 library_name: transformers
 tags:
-- muiltimodal
 - text-to-image
 - unified-model
 ---
-<!--
-## 1. Introduction
-Janus-Pro is a novel autoregressive framework that unifies multimodal understanding and generation.
-It addresses the limitations of previous approaches by decoupling visual encoding into separate pathways, while still utilizing a single, unified transformer architecture for processing. The decoupling not only alleviates the conflict between the visual encoder’s roles in understanding and generation, but also enhances the framework’s flexibility.
-Janus-Pro surpasses previous unified model and matches or exceeds the performance of task-specific models.
-The simplicity, high flexibility, and effectiveness of Janus-Pro make it a strong candidate for next-generation unified multimodal models.
-[**Github Repository**](https://github.com/deepseek-ai/Janus)
 <div align="center">
-<img alt="image" src="janus_pro_teaser1.png" style="width:90%;">
 </div>
 <div align="center">
-<img alt="image" src="janus_pro_teaser2.png" style="width:90%;">
 </div>
-### 2. Model Summary
-Janus-Pro is a unified understanding and generation MLLM, which decouples visual encoding for multimodal understanding and generation.
-Janus-Pro is constructed based on the DeepSeek-LLM-1.5b-base/DeepSeek-LLM-7b-base.
-For multimodal understanding, it uses the [SigLIP-L](https://huggingface.co/timm/ViT-L-16-SigLIP-384) as the vision encoder, which supports 384 x 384 image input. For image generation, Janus-Pro uses the tokenizer from [here](https://github.com/FoundationVision/LlamaGen) with a downsample rate of 16.
-## 3. Quick Start
-Please refer to [**Github Repository**](https://github.com/deepseek-ai/Janus)
-## 4. License
-This code repository is licensed under [the MIT License](https://github.com/deepseek-ai/DeepSeek-LLM/blob/HEAD/LICENSE-CODE). The use of Janus-Pro models is subject to [DeepSeek Model License](https://github.com/deepseek-ai/DeepSeek-LLM/blob/HEAD/LICENSE-MODEL).
-## 5. Citation
 ```
-@article{chen2025janus,
-  title={Janus-Pro: Unified Multimodal Understanding and Generation with Data and Model Scaling},
-  author={Chen, Xiaokang and Wu, Zhiyu and Liu, Xingchao and Pan, Zizheng and Liu, Wen and Xie, Zhenda and Yu, Xingkai and Ruan, Chong},
-  journal={arXiv preprint arXiv:2501.17811},
-  year={2025}
 }
-```
-## 6. Contact
-If you have any questions, please raise an issue or contact us at [service@deepseek.com](mailto:service@deepseek.com). -->

 pipeline_tag: any-to-any
 library_name: transformers
 tags:
 - text-to-image
+- text-and-image-to-image
+- muiltimodal
 - unified-model
+language:
+- en
+base_model:
+- deepseek-ai/Janus-Pro-7B
+datasets:
+- FreedomIntelligence/ShareGPT-4o-Image
 ---
 <div align="center">
+<h1>
+  Janus-4o-7B
+</h1>
 </div>
 <div align="center">
+<a href="https://github.com/FreedomIntelligence/ShareGPT-4o-Image" target="_blank">🧰GitHub</a> | <a href="https://arxiv.org/abs/2506.18095" target="_blank">📃Paper</a> | <a href="https://arxiv.org/abs/2506.18095" target="_blank">📚ShareGPT-4o-Image</a>
 </div>
+## 1. Introduction
+Janus-4o is a multimodal large language model (MLLM) capable of both **text-to-image** and **text-and-image-to-image** generation. It is fine-tuned from [Janus-Pro-7B](https://huggingface.co/deepseek-ai/Janus-Pro-7B) using the [ShareGPT-4o-Image](https://huggingface.co/datasets/FreedomIntelligence/ShareGPT-4o-Image) dataset to align Janus-Pro with GPT-4o image generation capabilities. Compared to Janus-Pro, Janus-4o newly supports text-and-image-to-image generation capabilities, along with notable improvements in text-to-image tasks.
+## 2. Quick Start
+## Citation
+If you find our dataset helpful, please consider citing our work:
 ```
+@misc{chen2025sharegpt4oimagealigningmultimodalmodels,
+      title={ShareGPT-4o-Image: Aligning Multimodal Models with GPT-4o-Level Image Generation},
+      author={Junying Chen and Zhenyang Cai and Pengcheng Chen and Shunian Chen and Ke Ji and Xidong Wang and Yunjin Yang and Benyou Wang},
+      year={2025},
+      eprint={2506.18095},
+      archivePrefix={arXiv},
+      primaryClass={cs.CV},
+      url={https://arxiv.org/abs/2506.18095},
 }
+```