PaDT-MLLM commited on
Commit
b3f4f08
Β·
verified Β·
1 Parent(s): 54e4bb9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +61 -2
README.md CHANGED
@@ -85,7 +85,7 @@ from PaDT import PaDTForConditionalGeneration, VisonTextProcessingClass, parseVR
85
 
86
 
87
  TEST_IMG_PATH="./eval/imgs/000000368335.jpg"
88
- MODEL_PATH="PaDT-MLLM/PaDT_Pro_3B"
89
 
90
  # load model
91
  model = PaDTForConditionalGeneration.from_pretrained(MODEL_PATH, torch_dtype=torch.bfloat16, device_map={"": 0})
@@ -97,7 +97,7 @@ processor = VisonTextProcessingClass(processor, model.config.vision_config.spati
97
  processor.prepare(model.model.embed_tokens.weight.shape[0])
98
 
99
  # question prompt
100
- PROMPT = "Please describe this image."
101
 
102
  # construct conversation
103
  message = [
@@ -187,6 +187,65 @@ Here are some randomly selected test examples showcasing PaDT’s excellent perf
187
  <img src="./assets/TAM.webp" width="900"/>
188
  </div>
189
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
190
  ## License Agreement
191
 
192
  PaDT is licensed under Apache 2.0.
 
85
 
86
 
87
  TEST_IMG_PATH="./eval/imgs/000000368335.jpg"
88
+ MODEL_PATH="PaDT-MLLM/PaDT_REC_3B"
89
 
90
  # load model
91
  model = PaDTForConditionalGeneration.from_pretrained(MODEL_PATH, torch_dtype=torch.bfloat16, device_map={"": 0})
 
97
  processor.prepare(model.model.embed_tokens.weight.shape[0])
98
 
99
  # question prompt
100
+ PROMPT = """Please carefully check the image and detect the object this sentence describes: "The car is on the left side of the horse"."""
101
 
102
  # construct conversation
103
  message = [
 
187
  <img src="./assets/TAM.webp" width="900"/>
188
  </div>
189
 
190
+ ## Training Instruction
191
+
192
+ Download Datasets:
193
+
194
+ - [COCO](https://cocodataset.org/#home)
195
+
196
+ - RefCOCO/+/g
197
+ ```bash
198
+ wget https://web.archive.org/web/20220413011718/https://bvisionweb1.cs.unc.edu/licheng/referit/data/refcoco.zip
199
+ wget https://web.archive.org/web/20220413011656/https://bvisionweb1.cs.unc.edu/licheng/referit/data/refcoco+.zip
200
+ wget https://web.archive.org/web/20220413012904/https://bvisionweb1.cs.unc.edu/licheng/referit/data/refcocog.zip
201
+ ```
202
+
203
+ Unpack these datasets and place them under the following directory:
204
+
205
+ ```
206
+ PaDT/
207
+ β”œβ”€β”€ dataset/
208
+ β”‚ β”œβ”€β”€ coco/
209
+ β”‚ β”‚ β”œβ”€β”€ annotations/
210
+ β”‚ β”‚ β”œβ”€β”€ train2014/
211
+ β”‚ β”‚ β”œβ”€β”€ train2017/
212
+ β”‚ β”‚ β”œβ”€β”€ val2014/
213
+ β”‚ β”‚ └── val2017/
214
+ β”‚ └── RefCOCO/
215
+ β”‚ β”œβ”€β”€ refcoco/
216
+ β”‚ β”œβ”€β”€ refcoco+/
217
+ β”‚ └── refcocog/
218
+ ```
219
+
220
+ Preprocess the datasets:
221
+ - 1. Preprocess via our scripts. (Please first update the dataset path configuration in the preprocessing scripts)
222
+ ```bash
223
+ cd src/preprocess
224
+ python process_coco.py
225
+ python process_refcoco.py
226
+ ```
227
+ - 2. We also released the preprocessed datasets which are ready to use for training in huggingface.
228
+
229
+ | Dataset | Dataset Path | Task Type |
230
+ | - | - | -|
231
+ | COCO | [PaDT-MLLM/COCO](https://huggingface.co/datasets/PaDT-MLLM/COCO) | Open Vocabulary Detection |
232
+ | RefCOCO | [PaDT-MLLM/RefCOCO](https://huggingface.co/datasets/PaDT-MLLM/RefCOCO) | Referring Expression Comprehension/Segmentation |
233
+ | RIC | [PaDT-MLLM/ReferringImageCaptioning](https://huggingface.co/datasets/PaDT-MLLM/ReferringImageCaptioning) | Referring Image Captioning |
234
+
235
+
236
+ The training scripts in `run_scripts` are ready to execute.
237
+
238
+ For example: Train the PaDT-Pro 3B model on a single node with 8Γ—96 GB GPUs.
239
+
240
+ ```bash
241
+ bash ./run_scripts/padt_pro_3b_sft.sh
242
+ ```
243
+
244
+ ## Evaluation
245
+
246
+ We provide a simple inference example in `eval/test_demo.py`. More evaluation scripts will be added soon.
247
+
248
+
249
  ## License Agreement
250
 
251
  PaDT is licensed under Apache 2.0.