YangYunjin
commited on
Commit
·
5d87d58
1
Parent(s):
9850215
update
Browse files- ._.gitattributes +0 -0
- ._README.md +0 -0
- ._config.json +3 -0
- ._model-00001-of-00003.safetensors +3 -0
- ._model-00002-of-00003.safetensors +3 -0
- ._model-00003-of-00003.safetensors +3 -0
- ._model.safetensors.index.json +3 -0
- ._preprocessor_config.json +3 -0
- ._processor_config.json +3 -0
- ._special_tokens_map.json +3 -0
- ._tokenizer.json +3 -0
- ._tokenizer_config.json +3 -0
- .gitattributes +2 -0
- README.md +61 -3
- config.json +3 -0
- model-00001-of-00003.safetensors +3 -0
- model-00002-of-00003.safetensors +3 -0
- model-00003-of-00003.safetensors +3 -0
- model.safetensors.index.json +3 -0
- preprocessor_config.json +3 -0
- processor_config.json +3 -0
- special_tokens_map.json +3 -0
- tokenizer.json +3 -0
- tokenizer_config.json +3 -0
._.gitattributes
ADDED
Binary file (4.1 kB). View file
|
|
._README.md
ADDED
Binary file (4.1 kB). View file
|
|
._config.json
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:926cd45db5af3c3dc3bcdddf4841166ab9e10fb905a2f773127db99deb44b88e
|
3 |
+
size 4096
|
._model-00001-of-00003.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:926cd45db5af3c3dc3bcdddf4841166ab9e10fb905a2f773127db99deb44b88e
|
3 |
+
size 4096
|
._model-00002-of-00003.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:926cd45db5af3c3dc3bcdddf4841166ab9e10fb905a2f773127db99deb44b88e
|
3 |
+
size 4096
|
._model-00003-of-00003.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:926cd45db5af3c3dc3bcdddf4841166ab9e10fb905a2f773127db99deb44b88e
|
3 |
+
size 4096
|
._model.safetensors.index.json
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:926cd45db5af3c3dc3bcdddf4841166ab9e10fb905a2f773127db99deb44b88e
|
3 |
+
size 4096
|
._preprocessor_config.json
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:926cd45db5af3c3dc3bcdddf4841166ab9e10fb905a2f773127db99deb44b88e
|
3 |
+
size 4096
|
._processor_config.json
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:926cd45db5af3c3dc3bcdddf4841166ab9e10fb905a2f773127db99deb44b88e
|
3 |
+
size 4096
|
._special_tokens_map.json
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:926cd45db5af3c3dc3bcdddf4841166ab9e10fb905a2f773127db99deb44b88e
|
3 |
+
size 4096
|
._tokenizer.json
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:926cd45db5af3c3dc3bcdddf4841166ab9e10fb905a2f773127db99deb44b88e
|
3 |
+
size 4096
|
._tokenizer_config.json
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:926cd45db5af3c3dc3bcdddf4841166ab9e10fb905a2f773127db99deb44b88e
|
3 |
+
size 4096
|
.gitattributes
CHANGED
@@ -33,3 +33,5 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
|
33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
|
|
|
|
|
33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
36 |
+
*.json filter=lfs diff=lfs merge=lfs -text
|
37 |
+
*.memmap filter=lfs diff=lfs merge=lfs -text
|
README.md
CHANGED
@@ -1,3 +1,61 @@
|
|
1 |
-
---
|
2 |
-
license:
|
3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: mit
|
3 |
+
license_name: deepseek
|
4 |
+
license_link: LICENSE
|
5 |
+
pipeline_tag: any-to-any
|
6 |
+
library_name: transformers
|
7 |
+
tags:
|
8 |
+
- muiltimodal
|
9 |
+
- text-to-image
|
10 |
+
- unified-model
|
11 |
+
---
|
12 |
+
|
13 |
+
## 1. Introduction
|
14 |
+
|
15 |
+
Janus-Pro is a novel autoregressive framework that unifies multimodal understanding and generation.
|
16 |
+
It addresses the limitations of previous approaches by decoupling visual encoding into separate pathways, while still utilizing a single, unified transformer architecture for processing. The decoupling not only alleviates the conflict between the visual encoder’s roles in understanding and generation, but also enhances the framework’s flexibility.
|
17 |
+
Janus-Pro surpasses previous unified model and matches or exceeds the performance of task-specific models.
|
18 |
+
The simplicity, high flexibility, and effectiveness of Janus-Pro make it a strong candidate for next-generation unified multimodal models.
|
19 |
+
|
20 |
+
[**Github Repository**](https://github.com/deepseek-ai/Janus)
|
21 |
+
|
22 |
+
<div align="center">
|
23 |
+
<img alt="image" src="janus_pro_teaser1.png" style="width:90%;">
|
24 |
+
</div>
|
25 |
+
|
26 |
+
<div align="center">
|
27 |
+
<img alt="image" src="janus_pro_teaser2.png" style="width:90%;">
|
28 |
+
</div>
|
29 |
+
|
30 |
+
|
31 |
+
### 2. Model Summary
|
32 |
+
|
33 |
+
Janus-Pro is a unified understanding and generation MLLM, which decouples visual encoding for multimodal understanding and generation.
|
34 |
+
Janus-Pro is constructed based on the DeepSeek-LLM-1.5b-base/DeepSeek-LLM-7b-base.
|
35 |
+
|
36 |
+
For multimodal understanding, it uses the [SigLIP-L](https://huggingface.co/timm/ViT-L-16-SigLIP-384) as the vision encoder, which supports 384 x 384 image input. For image generation, Janus-Pro uses the tokenizer from [here](https://github.com/FoundationVision/LlamaGen) with a downsample rate of 16.
|
37 |
+
|
38 |
+
|
39 |
+
|
40 |
+
## 3. Quick Start
|
41 |
+
|
42 |
+
Please refer to [**Github Repository**](https://github.com/deepseek-ai/Janus)
|
43 |
+
|
44 |
+
|
45 |
+
## 4. License
|
46 |
+
|
47 |
+
This code repository is licensed under [the MIT License](https://github.com/deepseek-ai/DeepSeek-LLM/blob/HEAD/LICENSE-CODE). The use of Janus-Pro models is subject to [DeepSeek Model License](https://github.com/deepseek-ai/DeepSeek-LLM/blob/HEAD/LICENSE-MODEL).
|
48 |
+
## 5. Citation
|
49 |
+
|
50 |
+
```
|
51 |
+
@article{chen2025janus,
|
52 |
+
title={Janus-Pro: Unified Multimodal Understanding and Generation with Data and Model Scaling},
|
53 |
+
author={Chen, Xiaokang and Wu, Zhiyu and Liu, Xingchao and Pan, Zizheng and Liu, Wen and Xie, Zhenda and Yu, Xingkai and Ruan, Chong},
|
54 |
+
journal={arXiv preprint arXiv:2501.17811},
|
55 |
+
year={2025}
|
56 |
+
}
|
57 |
+
```
|
58 |
+
|
59 |
+
## 6. Contact
|
60 |
+
|
61 |
+
If you have any questions, please raise an issue or contact us at [service@deepseek.com](mailto:service@deepseek.com).
|
config.json
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:561fdcd22965ed9fab979259426fcf9831823a8540b6cad8717765918b1c50fd
|
3 |
+
size 1282
|
model-00001-of-00003.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:c5b52f0483b8569186115a6a1eba87446363dea1d1b3addef93a5b948f57cb3e
|
3 |
+
size 4916851534
|
model-00002-of-00003.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:8c72e2cb11b6407f00e4e4ae4133fdb3cfbaec053e5d4d6b2aab60cd5e15349b
|
3 |
+
size 4947392496
|
model-00003-of-00003.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:4fe889057c6f13bb1fe3a62eebdfb127f3acfaf874dc19299c45bea53745c7db
|
3 |
+
size 4976742608
|
model.safetensors.index.json
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:34700500210eaed06cee767c5caca364277a2ec3b4ef9a539749ebbeaca986b8
|
3 |
+
size 89033
|
preprocessor_config.json
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:4dae4dc1bda762bdc84b887b2d3339f21935a15ff716b3049490d96935f7f12a
|
3 |
+
size 346
|
processor_config.json
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:19b96079e21e3cf0409f7252431100892f2ab2f377c69845f5090670fc319dd2
|
3 |
+
size 334
|
special_tokens_map.json
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:42cbf6a44df7b2beed050aeb017d5d2e43e3c507d492fed797be8b939748798e
|
3 |
+
size 684
|
tokenizer.json
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:42bcf2c54739affa70425520f9f8eb48e7409cf515541c74594de4c412b7d5ad
|
3 |
+
size 7614107
|
tokenizer_config.json
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:520c2e09d31f3fdec13b5a521845dbda9b1b1235121d167183a9d632eb27c6a5
|
3 |
+
size 107870
|