Instructions to use ByteDance/Bernini-Diffusers with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use ByteDance/Bernini-Diffusers with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("ByteDance/Bernini-Diffusers", dtype=torch.bfloat16, device_map="cuda") prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" image = pipe(prompt).images[0] - Notebooks
- Google Colab
- Kaggle
Latent Semantic Planning for Video Diffusion
Chenchen Liu*, Junyi Chen*, Lei Li*, Lu Chi*,Β§, Mingzhen Sun*, Zhuoying Li*, Yi Fu, Ruoyu Guo, Yiheng Wu, Ge Bai, Zehuan Yuanβ
* Equal contribution β Corresponding author Β§ Project lead
π News
- [2026-06-10] We open-sourced the inference code and model weights of the full Bernini (Bernini).
- [2026-05-22] We released our paper Bernini: Latent Semantic Planning for Video Diffusion.
β¨ Highlights
Bernini is a unified framework for video generation and editing that combines an MLLM-based semantic planner with a DiT-based renderer.
Compared with the renderer-only Bernini-R release, Bernini-Diffusers packages the full semantic-planning pipeline: a Qwen2.5-VL planner, Bernini planning weights, and Wan2.2 diffusion components in one self-contained directory. This makes it the recommended release when you need stronger instruction following, multi-step semantic planning, and better handling of complex video editing requests.
π§Ύ Model card
| Field | Description |
|---|---|
| Model type | Full video generation/editing pipeline with an MLLM-based semantic planner and a DiT-based renderer. |
| Checkpoint | ByteDance/Bernini-Diffusers |
| Code | ByteDance/Bernini |
| Recommended use | Complex generation/editing requests that benefit from explicit latent semantic planning and stronger instruction following. |
| Model behavior | Better at decomposing complex instructions and planning semantic changes before rendering, at the cost of a heavier checkpoint layout than Bernini-R. |
Benchmark snapshot
| Model | EditVerse | OpenVE | OpenS2V | VBench | Bernini-v2v (OS) | Bernini-vr2v (OS) |
|---|---|---|---|---|---|---|
| Bernini 7+14B | 8.02 | 4.03 | 62.30 | 84.37 | 3.49 | 3.48 |
On video editing, Bernini reaches the first tier among leading closed-source commercial models in our internal arena evaluation based on blind human pairwise comparisons.
π¦ Package layout
This release is a self-contained diffusers-format directory. Pass the downloaded Bernini-Diffusers directory directly to --config.
Bernini-Diffusers/
bernini/
mllm/
scheduler/
t5_text_encoder/
t5_tokenizer/
vae/
config.json
transformer_config.json
transformer_2_config.json
At runtime:
bernini/provides the Bernini planning checkpoint.mllm/provides the Qwen2.5-VL planner assets.transformer_config.jsonandtransformer_2_config.jsondefine the Wan2.2 diffusion decoder components used by the full pipeline.t5_text_encoder/,t5_tokenizer/,vae/, andscheduler/provide the base diffusion modules required for inference.
π₯ Download
pip install -U "huggingface_hub"
hf download ByteDance/Bernini-Diffusers \
--local-dir pretrained_models/Bernini-Diffusers
π Usage
The official inference code is available in the Bernini repository.
Installation
git clone https://github.com/bytedance/Bernini.git bernini && cd bernini
pip install -r requirements.txt
Recommended environment:
- Python 3.11.2
- PyTorch 2.5.1+cu124
- CUDA toolkit 12.4
- GPU Hopper GPUs (H100/H800/H200) are recommended for best performance
For multi-GPU sequence parallel inference, install VeOmni:
pip install --no-deps git+https://github.com/ByteDance-Seed/VeOmni.git@v0.1.10
Load the model
Pass the downloaded directory directly as --config:
python infer_single_gpu.py --config pretrained_models/Bernini-Diffusers \
--case assets/testcases/i2i/i2i.json --num_frames 1
Prompt enhancer (highly recommended)
--use_pe enhances the prompt through an OpenAI-compatible endpoint and is recommended for best generation quality.
export BERNINI_PE_API_KEY=... # or OPENAI_API_KEY
export BERNINI_PE_BASE_URL=... # or OPENAI_BASE_URL
export BERNINI_PE_MODEL=... # vision-capable chat model
Gradio demo
# Single GPU
python gradio_demo.py --config pretrained_models/Bernini-Diffusers --port 7860
# 8 GPUs, 8-way Ulysses sequence parallel
torchrun --nproc-per-node 8 gradio_demo.py --ulysses 8 \
--config pretrained_models/Bernini-Diffusers \
--port 7860 --share
Run scripts
The scripts/bernini/ directory in the Bernini repo provides ready-to-run task launchers for the full pipeline:
run_t2i.shrun_i2i.shrun_t2v.shrun_v2v.shrun_rv2v.shrun_r2v.shrun_gradio.sh
You can override the model directory with:
export BERNINI_CONFIG=/path/to/Bernini-Diffusers
π Citation
If you use Bernini in your research, please cite:
@article{bernini,
title = {Bernini: Latent Semantic Planning for Video Diffusion},
author = {Chenchen Liu and Junyi Chen and Lei Li and Lu Chi and Mingzhen Sun and Zhuoying Li and Yi Fu and Ruoyu Guo and Yiheng Wu and Ge Bai and Zehuan Yuan},
journal = {arXiv preprint arXiv:2605.22344},
year = {2026}
}
π Acknowledgements
Bernini builds on several outstanding open-source projects:
π License
Apache License 2.0.
- Downloads last month
- -