Ross3D: Reconstructive Visual Instruction Tuning with 3D-Awareness Paper • 2504.01901 • Published Apr 2
PairUni: Pairwise Training for Unified Multimodal Language Models Paper • 2510.25682 • Published Oct 29 • 13
MVU-Eval: Towards Multi-Video Understanding Evaluation for Multimodal LLMs Paper • 2511.07250 • Published 29 days ago • 17
Open-o3 Video: Grounded Video Reasoning with Explicit Spatio-Temporal Evidence Paper • 2510.20579 • Published Oct 23 • 55
DriveVLA-W0: World Models Amplify Data Scaling Law in Autonomous Driving Paper • 2510.12796 • Published Oct 14 • 12
Grasp Any Region: Towards Precise, Contextual Pixel Understanding for Multimodal LLMs Paper • 2510.18876 • Published Oct 21 • 36
The Scalability of Simplicity: Empirical Analysis of Vision-Language Learning with a Single Transformer Paper • 2504.10462 • Published Apr 14 • 15
Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation and Methodology Paper • 2507.07999 • Published Jul 10 • 49
DropPos: Pre-Training Vision Transformers by Reconstructing Dropped Positions Paper • 2309.03576 • Published Sep 7, 2023 • 2
OpenSatMap: A Fine-grained High-resolution Satellite Dataset for Large-scale Map Construction Paper • 2410.23278 • Published Oct 30, 2024 • 2
Semi-Supervised Semantic Segmentation Using Unreliable Pseudo-Labels Paper • 2203.03884 • Published Mar 8, 2022 • 1
Balancing Logit Variation for Long-tailed Semantic Segmentation Paper • 2306.02061 • Published Jun 3, 2023 • 1
Learning from Future: A Novel Self-Training Framework for Semantic Segmentation Paper • 2209.06993 • Published Sep 15, 2022 • 1
Using Unreliable Pseudo-Labels for Label-Efficient Semantic Segmentation Paper • 2306.02314 • Published Jun 4, 2023 • 1
Pulling Target to Source: A New Perspective on Domain Adaptive Semantic Segmentation Paper • 2305.13752 • Published May 23, 2023 • 1