Update README.md
Browse files
README.md
CHANGED
@@ -5,57 +5,47 @@ license_link: LICENSE
|
|
5 |
pipeline_tag: any-to-any
|
6 |
library_name: transformers
|
7 |
tags:
|
8 |
-
- muiltimodal
|
9 |
- text-to-image
|
|
|
|
|
10 |
- unified-model
|
|
|
|
|
|
|
|
|
|
|
|
|
11 |
---
|
12 |
-
<!--
|
13 |
-
## 1. Introduction
|
14 |
-
|
15 |
-
Janus-Pro is a novel autoregressive framework that unifies multimodal understanding and generation.
|
16 |
-
It addresses the limitations of previous approaches by decoupling visual encoding into separate pathways, while still utilizing a single, unified transformer architecture for processing. The decoupling not only alleviates the conflict between the visual encoder’s roles in understanding and generation, but also enhances the framework’s flexibility.
|
17 |
-
Janus-Pro surpasses previous unified model and matches or exceeds the performance of task-specific models.
|
18 |
-
The simplicity, high flexibility, and effectiveness of Janus-Pro make it a strong candidate for next-generation unified multimodal models.
|
19 |
-
|
20 |
-
[**Github Repository**](https://github.com/deepseek-ai/Janus)
|
21 |
|
22 |
<div align="center">
|
23 |
-
<
|
|
|
|
|
24 |
</div>
|
25 |
|
26 |
<div align="center">
|
27 |
-
<
|
28 |
</div>
|
29 |
|
|
|
30 |
|
31 |
-
|
32 |
-
|
33 |
-
Janus-Pro is a unified understanding and generation MLLM, which decouples visual encoding for multimodal understanding and generation.
|
34 |
-
Janus-Pro is constructed based on the DeepSeek-LLM-1.5b-base/DeepSeek-LLM-7b-base.
|
35 |
-
|
36 |
-
For multimodal understanding, it uses the [SigLIP-L](https://huggingface.co/timm/ViT-L-16-SigLIP-384) as the vision encoder, which supports 384 x 384 image input. For image generation, Janus-Pro uses the tokenizer from [here](https://github.com/FoundationVision/LlamaGen) with a downsample rate of 16.
|
37 |
-
|
38 |
-
|
39 |
-
|
40 |
-
## 3. Quick Start
|
41 |
|
42 |
-
Please refer to [**Github Repository**](https://github.com/deepseek-ai/Janus)
|
43 |
|
|
|
44 |
|
45 |
-
##
|
46 |
|
47 |
-
|
48 |
-
## 5. Citation
|
49 |
|
50 |
```
|
51 |
-
@
|
52 |
-
|
53 |
-
|
54 |
-
|
55 |
-
|
|
|
|
|
|
|
56 |
}
|
57 |
-
```
|
58 |
-
|
59 |
-
## 6. Contact
|
60 |
-
|
61 |
-
If you have any questions, please raise an issue or contact us at [service@deepseek.com](mailto:service@deepseek.com). -->
|
|
|
5 |
pipeline_tag: any-to-any
|
6 |
library_name: transformers
|
7 |
tags:
|
|
|
8 |
- text-to-image
|
9 |
+
- text-and-image-to-image
|
10 |
+
- muiltimodal
|
11 |
- unified-model
|
12 |
+
language:
|
13 |
+
- en
|
14 |
+
base_model:
|
15 |
+
- deepseek-ai/Janus-Pro-7B
|
16 |
+
datasets:
|
17 |
+
- FreedomIntelligence/ShareGPT-4o-Image
|
18 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
19 |
|
20 |
<div align="center">
|
21 |
+
<h1>
|
22 |
+
Janus-4o-7B
|
23 |
+
</h1>
|
24 |
</div>
|
25 |
|
26 |
<div align="center">
|
27 |
+
<a href="https://github.com/FreedomIntelligence/ShareGPT-4o-Image" target="_blank">🧰GitHub</a> | <a href="https://arxiv.org/abs/2506.18095" target="_blank">📃Paper</a> | <a href="https://arxiv.org/abs/2506.18095" target="_blank">📚ShareGPT-4o-Image</a>
|
28 |
</div>
|
29 |
|
30 |
+
## 1. Introduction
|
31 |
|
32 |
+
Janus-4o is a multimodal large language model (MLLM) capable of both **text-to-image** and **text-and-image-to-image** generation. It is fine-tuned from [Janus-Pro-7B](https://huggingface.co/deepseek-ai/Janus-Pro-7B) using the [ShareGPT-4o-Image](https://huggingface.co/datasets/FreedomIntelligence/ShareGPT-4o-Image) dataset to align Janus-Pro with GPT-4o image generation capabilities. Compared to Janus-Pro, Janus-4o newly supports text-and-image-to-image generation capabilities, along with notable improvements in text-to-image tasks.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
33 |
|
|
|
34 |
|
35 |
+
## 2. Quick Start
|
36 |
|
37 |
+
## Citation
|
38 |
|
39 |
+
If you find our dataset helpful, please consider citing our work:
|
|
|
40 |
|
41 |
```
|
42 |
+
@misc{chen2025sharegpt4oimagealigningmultimodalmodels,
|
43 |
+
title={ShareGPT-4o-Image: Aligning Multimodal Models with GPT-4o-Level Image Generation},
|
44 |
+
author={Junying Chen and Zhenyang Cai and Pengcheng Chen and Shunian Chen and Ke Ji and Xidong Wang and Yunjin Yang and Benyou Wang},
|
45 |
+
year={2025},
|
46 |
+
eprint={2506.18095},
|
47 |
+
archivePrefix={arXiv},
|
48 |
+
primaryClass={cs.CV},
|
49 |
+
url={https://arxiv.org/abs/2506.18095},
|
50 |
}
|
51 |
+
```
|
|
|
|
|
|
|
|