YangYunjin commited on
Commit
5d87d58
·
1 Parent(s): 9850215
._.gitattributes ADDED
Binary file (4.1 kB). View file
 
._README.md ADDED
Binary file (4.1 kB). View file
 
._config.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:926cd45db5af3c3dc3bcdddf4841166ab9e10fb905a2f773127db99deb44b88e
3
+ size 4096
._model-00001-of-00003.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:926cd45db5af3c3dc3bcdddf4841166ab9e10fb905a2f773127db99deb44b88e
3
+ size 4096
._model-00002-of-00003.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:926cd45db5af3c3dc3bcdddf4841166ab9e10fb905a2f773127db99deb44b88e
3
+ size 4096
._model-00003-of-00003.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:926cd45db5af3c3dc3bcdddf4841166ab9e10fb905a2f773127db99deb44b88e
3
+ size 4096
._model.safetensors.index.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:926cd45db5af3c3dc3bcdddf4841166ab9e10fb905a2f773127db99deb44b88e
3
+ size 4096
._preprocessor_config.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:926cd45db5af3c3dc3bcdddf4841166ab9e10fb905a2f773127db99deb44b88e
3
+ size 4096
._processor_config.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:926cd45db5af3c3dc3bcdddf4841166ab9e10fb905a2f773127db99deb44b88e
3
+ size 4096
._special_tokens_map.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:926cd45db5af3c3dc3bcdddf4841166ab9e10fb905a2f773127db99deb44b88e
3
+ size 4096
._tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:926cd45db5af3c3dc3bcdddf4841166ab9e10fb905a2f773127db99deb44b88e
3
+ size 4096
._tokenizer_config.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:926cd45db5af3c3dc3bcdddf4841166ab9e10fb905a2f773127db99deb44b88e
3
+ size 4096
.gitattributes CHANGED
@@ -33,3 +33,5 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ *.json filter=lfs diff=lfs merge=lfs -text
37
+ *.memmap filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -1,3 +1,61 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ license_name: deepseek
4
+ license_link: LICENSE
5
+ pipeline_tag: any-to-any
6
+ library_name: transformers
7
+ tags:
8
+ - muiltimodal
9
+ - text-to-image
10
+ - unified-model
11
+ ---
12
+
13
+ ## 1. Introduction
14
+
15
+ Janus-Pro is a novel autoregressive framework that unifies multimodal understanding and generation.
16
+ It addresses the limitations of previous approaches by decoupling visual encoding into separate pathways, while still utilizing a single, unified transformer architecture for processing. The decoupling not only alleviates the conflict between the visual encoder’s roles in understanding and generation, but also enhances the framework’s flexibility.
17
+ Janus-Pro surpasses previous unified model and matches or exceeds the performance of task-specific models.
18
+ The simplicity, high flexibility, and effectiveness of Janus-Pro make it a strong candidate for next-generation unified multimodal models.
19
+
20
+ [**Github Repository**](https://github.com/deepseek-ai/Janus)
21
+
22
+ <div align="center">
23
+ <img alt="image" src="janus_pro_teaser1.png" style="width:90%;">
24
+ </div>
25
+
26
+ <div align="center">
27
+ <img alt="image" src="janus_pro_teaser2.png" style="width:90%;">
28
+ </div>
29
+
30
+
31
+ ### 2. Model Summary
32
+
33
+ Janus-Pro is a unified understanding and generation MLLM, which decouples visual encoding for multimodal understanding and generation.
34
+ Janus-Pro is constructed based on the DeepSeek-LLM-1.5b-base/DeepSeek-LLM-7b-base.
35
+
36
+ For multimodal understanding, it uses the [SigLIP-L](https://huggingface.co/timm/ViT-L-16-SigLIP-384) as the vision encoder, which supports 384 x 384 image input. For image generation, Janus-Pro uses the tokenizer from [here](https://github.com/FoundationVision/LlamaGen) with a downsample rate of 16.
37
+
38
+
39
+
40
+ ## 3. Quick Start
41
+
42
+ Please refer to [**Github Repository**](https://github.com/deepseek-ai/Janus)
43
+
44
+
45
+ ## 4. License
46
+
47
+ This code repository is licensed under [the MIT License](https://github.com/deepseek-ai/DeepSeek-LLM/blob/HEAD/LICENSE-CODE). The use of Janus-Pro models is subject to [DeepSeek Model License](https://github.com/deepseek-ai/DeepSeek-LLM/blob/HEAD/LICENSE-MODEL).
48
+ ## 5. Citation
49
+
50
+ ```
51
+ @article{chen2025janus,
52
+ title={Janus-Pro: Unified Multimodal Understanding and Generation with Data and Model Scaling},
53
+ author={Chen, Xiaokang and Wu, Zhiyu and Liu, Xingchao and Pan, Zizheng and Liu, Wen and Xie, Zhenda and Yu, Xingkai and Ruan, Chong},
54
+ journal={arXiv preprint arXiv:2501.17811},
55
+ year={2025}
56
+ }
57
+ ```
58
+
59
+ ## 6. Contact
60
+
61
+ If you have any questions, please raise an issue or contact us at [service@deepseek.com](mailto:service@deepseek.com).
config.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:561fdcd22965ed9fab979259426fcf9831823a8540b6cad8717765918b1c50fd
3
+ size 1282
model-00001-of-00003.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c5b52f0483b8569186115a6a1eba87446363dea1d1b3addef93a5b948f57cb3e
3
+ size 4916851534
model-00002-of-00003.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8c72e2cb11b6407f00e4e4ae4133fdb3cfbaec053e5d4d6b2aab60cd5e15349b
3
+ size 4947392496
model-00003-of-00003.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4fe889057c6f13bb1fe3a62eebdfb127f3acfaf874dc19299c45bea53745c7db
3
+ size 4976742608
model.safetensors.index.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:34700500210eaed06cee767c5caca364277a2ec3b4ef9a539749ebbeaca986b8
3
+ size 89033
preprocessor_config.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4dae4dc1bda762bdc84b887b2d3339f21935a15ff716b3049490d96935f7f12a
3
+ size 346
processor_config.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:19b96079e21e3cf0409f7252431100892f2ab2f377c69845f5090670fc319dd2
3
+ size 334
special_tokens_map.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:42cbf6a44df7b2beed050aeb017d5d2e43e3c507d492fed797be8b939748798e
3
+ size 684
tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:42bcf2c54739affa70425520f9f8eb48e7409cf515541c74594de4c412b7d5ad
3
+ size 7614107
tokenizer_config.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:520c2e09d31f3fdec13b5a521845dbda9b1b1235121d167183a9d632eb27c6a5
3
+ size 107870