jymcc commited on
Commit
1b4c6a3
·
verified ·
1 Parent(s): 3599e78

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +26 -36
README.md CHANGED
@@ -5,57 +5,47 @@ license_link: LICENSE
5
  pipeline_tag: any-to-any
6
  library_name: transformers
7
  tags:
8
- - muiltimodal
9
  - text-to-image
 
 
10
  - unified-model
 
 
 
 
 
 
11
  ---
12
- <!--
13
- ## 1. Introduction
14
-
15
- Janus-Pro is a novel autoregressive framework that unifies multimodal understanding and generation.
16
- It addresses the limitations of previous approaches by decoupling visual encoding into separate pathways, while still utilizing a single, unified transformer architecture for processing. The decoupling not only alleviates the conflict between the visual encoder’s roles in understanding and generation, but also enhances the framework’s flexibility.
17
- Janus-Pro surpasses previous unified model and matches or exceeds the performance of task-specific models.
18
- The simplicity, high flexibility, and effectiveness of Janus-Pro make it a strong candidate for next-generation unified multimodal models.
19
-
20
- [**Github Repository**](https://github.com/deepseek-ai/Janus)
21
 
22
  <div align="center">
23
- <img alt="image" src="janus_pro_teaser1.png" style="width:90%;">
 
 
24
  </div>
25
 
26
  <div align="center">
27
- <img alt="image" src="janus_pro_teaser2.png" style="width:90%;">
28
  </div>
29
 
 
30
 
31
- ### 2. Model Summary
32
-
33
- Janus-Pro is a unified understanding and generation MLLM, which decouples visual encoding for multimodal understanding and generation.
34
- Janus-Pro is constructed based on the DeepSeek-LLM-1.5b-base/DeepSeek-LLM-7b-base.
35
-
36
- For multimodal understanding, it uses the [SigLIP-L](https://huggingface.co/timm/ViT-L-16-SigLIP-384) as the vision encoder, which supports 384 x 384 image input. For image generation, Janus-Pro uses the tokenizer from [here](https://github.com/FoundationVision/LlamaGen) with a downsample rate of 16.
37
-
38
-
39
-
40
- ## 3. Quick Start
41
 
42
- Please refer to [**Github Repository**](https://github.com/deepseek-ai/Janus)
43
 
 
44
 
45
- ## 4. License
46
 
47
- This code repository is licensed under [the MIT License](https://github.com/deepseek-ai/DeepSeek-LLM/blob/HEAD/LICENSE-CODE). The use of Janus-Pro models is subject to [DeepSeek Model License](https://github.com/deepseek-ai/DeepSeek-LLM/blob/HEAD/LICENSE-MODEL).
48
- ## 5. Citation
49
 
50
  ```
51
- @article{chen2025janus,
52
- title={Janus-Pro: Unified Multimodal Understanding and Generation with Data and Model Scaling},
53
- author={Chen, Xiaokang and Wu, Zhiyu and Liu, Xingchao and Pan, Zizheng and Liu, Wen and Xie, Zhenda and Yu, Xingkai and Ruan, Chong},
54
- journal={arXiv preprint arXiv:2501.17811},
55
- year={2025}
 
 
 
56
  }
57
- ```
58
-
59
- ## 6. Contact
60
-
61
- If you have any questions, please raise an issue or contact us at [service@deepseek.com](mailto:service@deepseek.com). -->
 
5
  pipeline_tag: any-to-any
6
  library_name: transformers
7
  tags:
 
8
  - text-to-image
9
+ - text-and-image-to-image
10
+ - muiltimodal
11
  - unified-model
12
+ language:
13
+ - en
14
+ base_model:
15
+ - deepseek-ai/Janus-Pro-7B
16
+ datasets:
17
+ - FreedomIntelligence/ShareGPT-4o-Image
18
  ---
 
 
 
 
 
 
 
 
 
19
 
20
  <div align="center">
21
+ <h1>
22
+ Janus-4o-7B
23
+ </h1>
24
  </div>
25
 
26
  <div align="center">
27
+ <a href="https://github.com/FreedomIntelligence/ShareGPT-4o-Image" target="_blank">🧰GitHub</a> | <a href="https://arxiv.org/abs/2506.18095" target="_blank">📃Paper</a> | <a href="https://arxiv.org/abs/2506.18095" target="_blank">📚ShareGPT-4o-Image</a>
28
  </div>
29
 
30
+ ## 1. Introduction
31
 
32
+ Janus-4o is a multimodal large language model (MLLM) capable of both **text-to-image** and **text-and-image-to-image** generation. It is fine-tuned from [Janus-Pro-7B](https://huggingface.co/deepseek-ai/Janus-Pro-7B) using the [ShareGPT-4o-Image](https://huggingface.co/datasets/FreedomIntelligence/ShareGPT-4o-Image) dataset to align Janus-Pro with GPT-4o image generation capabilities. Compared to Janus-Pro, Janus-4o newly supports text-and-image-to-image generation capabilities, along with notable improvements in text-to-image tasks.
 
 
 
 
 
 
 
 
 
33
 
 
34
 
35
+ ## 2. Quick Start
36
 
37
+ ## Citation
38
 
39
+ If you find our dataset helpful, please consider citing our work:
 
40
 
41
  ```
42
+ @misc{chen2025sharegpt4oimagealigningmultimodalmodels,
43
+ title={ShareGPT-4o-Image: Aligning Multimodal Models with GPT-4o-Level Image Generation},
44
+ author={Junying Chen and Zhenyang Cai and Pengcheng Chen and Shunian Chen and Ke Ji and Xidong Wang and Yunjin Yang and Benyou Wang},
45
+ year={2025},
46
+ eprint={2506.18095},
47
+ archivePrefix={arXiv},
48
+ primaryClass={cs.CV},
49
+ url={https://arxiv.org/abs/2506.18095},
50
  }
51
+ ```