Update README.md
Browse files
README.md
CHANGED
|
@@ -85,7 +85,7 @@ from PaDT import PaDTForConditionalGeneration, VisonTextProcessingClass, parseVR
|
|
| 85 |
|
| 86 |
|
| 87 |
TEST_IMG_PATH="./eval/imgs/000000368335.jpg"
|
| 88 |
-
MODEL_PATH="PaDT-MLLM/
|
| 89 |
|
| 90 |
# load model
|
| 91 |
model = PaDTForConditionalGeneration.from_pretrained(MODEL_PATH, torch_dtype=torch.bfloat16, device_map={"": 0})
|
|
@@ -97,7 +97,7 @@ processor = VisonTextProcessingClass(processor, model.config.vision_config.spati
|
|
| 97 |
processor.prepare(model.model.embed_tokens.weight.shape[0])
|
| 98 |
|
| 99 |
# question prompt
|
| 100 |
-
PROMPT = "Please
|
| 101 |
|
| 102 |
# construct conversation
|
| 103 |
message = [
|
|
@@ -187,6 +187,65 @@ Here are some randomly selected test examples showcasing PaDTβs excellent perf
|
|
| 187 |
<img src="./assets/TAM.webp" width="900"/>
|
| 188 |
</div>
|
| 189 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 190 |
## License Agreement
|
| 191 |
|
| 192 |
PaDT is licensed under Apache 2.0.
|
|
|
|
| 85 |
|
| 86 |
|
| 87 |
TEST_IMG_PATH="./eval/imgs/000000368335.jpg"
|
| 88 |
+
MODEL_PATH="PaDT-MLLM/PaDT_REC_3B"
|
| 89 |
|
| 90 |
# load model
|
| 91 |
model = PaDTForConditionalGeneration.from_pretrained(MODEL_PATH, torch_dtype=torch.bfloat16, device_map={"": 0})
|
|
|
|
| 97 |
processor.prepare(model.model.embed_tokens.weight.shape[0])
|
| 98 |
|
| 99 |
# question prompt
|
| 100 |
+
PROMPT = """Please carefully check the image and detect the object this sentence describes: "The car is on the left side of the horse"."""
|
| 101 |
|
| 102 |
# construct conversation
|
| 103 |
message = [
|
|
|
|
| 187 |
<img src="./assets/TAM.webp" width="900"/>
|
| 188 |
</div>
|
| 189 |
|
| 190 |
+
## Training Instruction
|
| 191 |
+
|
| 192 |
+
Download Datasets:
|
| 193 |
+
|
| 194 |
+
- [COCO](https://cocodataset.org/#home)
|
| 195 |
+
|
| 196 |
+
- RefCOCO/+/g
|
| 197 |
+
```bash
|
| 198 |
+
wget https://web.archive.org/web/20220413011718/https://bvisionweb1.cs.unc.edu/licheng/referit/data/refcoco.zip
|
| 199 |
+
wget https://web.archive.org/web/20220413011656/https://bvisionweb1.cs.unc.edu/licheng/referit/data/refcoco+.zip
|
| 200 |
+
wget https://web.archive.org/web/20220413012904/https://bvisionweb1.cs.unc.edu/licheng/referit/data/refcocog.zip
|
| 201 |
+
```
|
| 202 |
+
|
| 203 |
+
Unpack these datasets and place them under the following directory:
|
| 204 |
+
|
| 205 |
+
```
|
| 206 |
+
PaDT/
|
| 207 |
+
βββ dataset/
|
| 208 |
+
β βββ coco/
|
| 209 |
+
β β βββ annotations/
|
| 210 |
+
β β βββ train2014/
|
| 211 |
+
β β βββ train2017/
|
| 212 |
+
β β βββ val2014/
|
| 213 |
+
β β βββ val2017/
|
| 214 |
+
β βββ RefCOCO/
|
| 215 |
+
β βββ refcoco/
|
| 216 |
+
β βββ refcoco+/
|
| 217 |
+
β βββ refcocog/
|
| 218 |
+
```
|
| 219 |
+
|
| 220 |
+
Preprocess the datasets:
|
| 221 |
+
- 1. Preprocess via our scripts. (Please first update the dataset path configuration in the preprocessing scripts)
|
| 222 |
+
```bash
|
| 223 |
+
cd src/preprocess
|
| 224 |
+
python process_coco.py
|
| 225 |
+
python process_refcoco.py
|
| 226 |
+
```
|
| 227 |
+
- 2. We also released the preprocessed datasets which are ready to use for training in huggingface.
|
| 228 |
+
|
| 229 |
+
| Dataset | Dataset Path | Task Type |
|
| 230 |
+
| - | - | -|
|
| 231 |
+
| COCO | [PaDT-MLLM/COCO](https://huggingface.co/datasets/PaDT-MLLM/COCO) | Open Vocabulary Detection |
|
| 232 |
+
| RefCOCO | [PaDT-MLLM/RefCOCO](https://huggingface.co/datasets/PaDT-MLLM/RefCOCO) | Referring Expression Comprehension/Segmentation |
|
| 233 |
+
| RIC | [PaDT-MLLM/ReferringImageCaptioning](https://huggingface.co/datasets/PaDT-MLLM/ReferringImageCaptioning) | Referring Image Captioning |
|
| 234 |
+
|
| 235 |
+
|
| 236 |
+
The training scripts in `run_scripts` are ready to execute.
|
| 237 |
+
|
| 238 |
+
For example: Train the PaDT-Pro 3B model on a single node with 8Γ96 GB GPUs.
|
| 239 |
+
|
| 240 |
+
```bash
|
| 241 |
+
bash ./run_scripts/padt_pro_3b_sft.sh
|
| 242 |
+
```
|
| 243 |
+
|
| 244 |
+
## Evaluation
|
| 245 |
+
|
| 246 |
+
We provide a simple inference example in `eval/test_demo.py`. More evaluation scripts will be added soon.
|
| 247 |
+
|
| 248 |
+
|
| 249 |
## License Agreement
|
| 250 |
|
| 251 |
PaDT is licensed under Apache 2.0.
|