SANA-WM: Efficient Minute-Scale World Modeling with Hybrid Linear Diffusion Transformer Paper • 2605.15178 • Published 18 days ago • 84
CausalCine: Real-Time Autoregressive Generation for Multi-Shot Video Narratives Paper • 2605.12496 • Published 20 days ago • 29
TextLDM: Language Modeling with Continuous Latent Diffusion Paper • 2605.07748 • Published 24 days ago • 26
FlashLabs Chroma 1.0: A Real-Time End-to-End Spoken Dialogue Model with Personalized Voice Cloning Paper • 2601.11141 • Published Jan 16 • 23
End-to-End Video Character Replacement without Structural Guidance Paper • 2601.08587 • Published Jan 13 • 8
Stream-DiffVSR: Low-Latency Streamable Video Super-Resolution via Auto-Regressive Diffusion Paper • 2512.23709 • Published Dec 29, 2025 • 51
InsertAnywhere: Bridging 4D Scene Geometry and Diffusion Models for Realistic Video Object Insertion Paper • 2512.17504 • Published Dec 19, 2025 • 99
Openly licensed large image datasets Collection Openly licensed dataset with allowed commercial usage • 3 items • Updated Jul 1, 2024 • 1
SigLIP Collection Contrastive (sigmoid) image-text models from https://arxiv.org/abs/2303.15343 • 10 items • Updated Mar 12 • 66
StreamingVLM: Real-Time Understanding for Infinite Video Streams Paper • 2510.09608 • Published Oct 10, 2025 • 53
LongLive: Real-time Interactive Long Video Generation Paper • 2509.22622 • Published Sep 26, 2025 • 189
Ctrl-Crash: Controllable Diffusion for Realistic Car Crashes Paper • 2506.00227 • Published May 30, 2025 • 12