DOGE: Towards Versatile Visual Document Grounding and Referring Paper • 2411.17125 • Published Nov 26, 2024
Mono2Stereo: A Benchmark and Empirical Study for Stereo Conversion Paper • 2503.22262 • Published Mar 28 • 1
MindOmni: Unleashing Reasoning Generation in Vision Language Models with RGPO Paper • 2505.13031 • Published May 19 • 4
How Far are VLMs from Visual Spatial Intelligence? A Benchmark-Driven Perspective Paper • 2509.18905 • Published Sep 23 • 29