SV-DRR: High-Fidelity Novel View X-Ray Synthesis Using Diffusion Model

This model, presented in the paper SV-DRR: High-Fidelity Novel View X-Ray Synthesis Using Diffusion Model, proposes a novel view-conditioned diffusion model for synthesizing multi-view X-ray images from a single view. Our approach leverages the Diffusion Transformer to preserve fine details and employs a weak-to-strong training strategy for stable high-resolution image generation, enabling higher-resolution outputs with improved control over viewing angles.

Code: https://github.com/xiechun298/SV-DRR

Visual Comparison with SOTA Methods

Usage

🚀 Quick Start

🛠️ Environment Setup

To ensure compatibility and reproducibility, follow these steps to set up the environment:

Clone the Repository:

git clone https://github.com/xiechun-tsukuba/svdrr.git
cd svdrr

Create a Python Virtual Environment:
```
conda create -f environment.yaml
```

⏬ Download Pretrained Models

You can download the pretrained models by either:

Option 1: Automated Download (Recommended)

python scripts/download_models.py

This will download all models into the models/ directory. Shared components will be stored in the shared/ folder, and symbolic links will be created in each model folder accordingly.

Option 2: Manual Download from Hugging Face

256 resolution: https://huggingface.co/xiechun-tsukuba/svdrr-dit-fb-256
512 resolution: https://huggingface.co/xiechun-tsukuba/svdrr-dit-fb-512
1024 resolution: https://huggingface.co/xiechun-tsukuba/svdrr-dit-fb-1024

🔍 Inference

Important Note: The coordinate system of LIDC-IDRI-DRR is opposite to the intuitive one — the polar angle increases downward, and the azimuth angle increases when rotating to the left. To invert the pose coordinate system, use the --flip_pose option.

Single Image Inference

Default views (azimuth angles from -90° to 90° in 5° increments):

python test_svdrr_DiT.py --model_path models/DiT-fb-512 \
    --image_path demo/real_xray.jpg \
    --log_dir outputs/ \
    --image_size 512 \
    --simple_pose

User-specified views defined in camera_views.json:

python test_svdrr_DiT.py --model_path models/DiT-fb-512 \
    --image_path demo/real_xray.jpg \
    --log_dir outputs/ \
    --image_size 512 \
    --poses demo/camera_views.json

Dataset Inference

Perform inference on the LIDC-IDRI-DRR dataset:

python test_svdrr_DiT.py --model_path models/svdrr-DiT-fb-256 \
    --dataset {path/to/dataset/} \
    --log_dir outputs/ \
    --image_size 256

BibTex

If you find this work useful, a citation will be appreciated via:

@InProceedings{XieChu_SVDRR_MICCAI2025,
        author = { Xie, Chun AND Yoshii, Yuichi AND Kitahara, Itaru},
        title = { { SV-DRR: High-Fidelity Novel View X-Ray Synthesis Using Diffusion Model } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15963},
        month = {September},
        page = {572 -- 582},
        doi = {https://doi.org/10.1007/978-3-032-04965-0_54}
}

@misc{xie2025svdrr,
        title = {SV-DRR: High-Fidelity Novel View X-Ray Synthesis Using Diffusion Model}, 
        author = {Chun Xie and Yuichi Yoshii and Itaru Kitahara},
        year = {2025},
        eprint = {2507.05148},
        archivePrefix = {arXiv},
        doi = {https://doi.org/10.48550/arXiv.2507.05148}, 
}

Downloads last month: 21