π_RL: Online RL Fine-tuning for Flow-based Vision-Language-Action Models Paper • 2510.25889 • Published 13 days ago • 59
INT v.s. FP: A Comprehensive Study of Fine-Grained Low-bit Quantization Formats Paper • 2510.25602 • Published 13 days ago • 67
OS-Sentinel: Towards Safety-Enhanced Mobile GUI Agents via Hybrid Validation in Realistic Workflows Paper • 2510.24411 • Published 15 days ago • 70
DeepAgent: A General Reasoning Agent with Scalable Toolsets Paper • 2510.21618 • Published 18 days ago • 94
Towards Mixed-Modal Retrieval for Universal Retrieval-Augmented Generation Paper • 2510.17354 • Published 23 days ago • 33
ARES: Multimodal Adaptive Reasoning via Difficulty-Aware Token-Level Entropy Shaping Paper • 2510.08457 • Published Oct 9 • 12
SpaceVista: All-Scale Visual Spatial Reasoning from mm to km Paper • 2510.09606 • Published Oct 10 • 17
MM-HELIX: Boosting Multimodal Long-Chain Reflective Reasoning with Holistic Platform and Adaptive Hybrid Policy Optimization Paper • 2510.08540 • Published Oct 9 • 108
SANA-Video: Efficient Video Generation with Block Linear Diffusion Transformer Paper • 2509.24695 • Published Sep 29 • 44
Toward Effective Tool-Integrated Reasoning via Self-Evolved Preference Learning Paper • 2509.23285 • Published Sep 27 • 13
Euclid's Gift: Enhancing Spatial Perception and Reasoning in Vision-Language Models via Geometric Surrogate Tasks Paper • 2509.24473 • Published Sep 29 • 16
OpenGPT-4o-Image: A Comprehensive Dataset for Advanced Image Generation and Editing Paper • 2509.24900 • Published Sep 29 • 53
We-Math 2.0: A Versatile MathBook System for Incentivizing Visual Mathematical Reasoning Paper • 2508.10433 • Published Aug 14 • 143
Stand-In: A Lightweight and Plug-and-Play Identity Control for Video Generation Paper • 2508.07901 • Published Aug 11 • 39
ReasonRank: Empowering Passage Ranking with Strong Reasoning Ability Paper • 2508.07050 • Published Aug 9 • 116