Taming Generative Synthetic Data for X-ray Prohibited Item Detection
This repository contains the Xsyn model, a one-stage X-ray security image synthesis pipeline based on text-to-image generation. Proposed in the paper Taming Generative Synthetic Data for X-ray Prohibited Item Detection, Xsyn addresses data insufficiency for prohibited item detection by incorporating two effective strategies: Cross-Attention Refinement (CAR) for refining bounding box annotations and Background Occlusion Modeling (BOM) for enhancing imaging complexity. It aims to achieve high-quality X-ray security image synthesis without incurring additional labor-intensive foreground preparation.
Code repository: https://github.com/pILLOW-1/Xsyn/
Download Xsyn models
Checkpoints for different datasets are available. All models here are based on GLIGEN.
| Dataset | Mode | Download |
|---|---|---|
| PIDray | text-grounded inpainting | HF Hub |
| OPIXray | text-grounded inpainting | HF Hub |
| HiXray | text-grounded inpainting | HF Hub |
Inference
We provide one script to generate x-ray security images and construct their annotations. First download models and put them in --ckpt_path. Then run:
python gligen_inference.py
Details of some important args:
--output_path: the path to save your generated x-ray security images--annotation_path: the path to save the refined annotation (stored in txt format)--vis_path: the path to save visualization compared with gt--ca_vis_path: the path to save cross-attention maps--image_path: the path to load images you want to inpaint--ckpt_path: the generation model checkpoint path--gligen_caption_pt: the file to prepare your training/test data in GLIGEN format--gen_method: set to 1 for Xsyn-M and 3 for Xsyn-A--refine_anno: set to True forCAR--latent_redist: set to True forBOM
After inference, you can use downstream_test.sh to test the performance of our synthetic data. Our downstream detection environment is mmdetection.