ToolScope: An Agentic Framework for Vision-Guided and Long-Horizon Tool Use Paper • 2510.27363 • Published 6 days ago • 20
UniLumos: Fast and Unified Image and Video Relighting with Physics-Plausible Feedback Paper • 2511.01678 • Published 3 days ago • 31
Actial: Activate Spatial Reasoning Ability of Multimodal Large Language Models Paper • 2511.01618 • Published 3 days ago • 9
World Simulation with Video Foundation Models for Physical AI Paper • 2511.00062 • Published 9 days ago • 29
Towards Universal Video Retrieval: Generalizing Video Embedding via Synthesized Multimodal Pyramid Curriculum Paper • 2510.27571 • Published 6 days ago • 15
The Underappreciated Power of Vision Models for Graph Structural Understanding Paper • 2510.24788 • Published 11 days ago • 32
Every Activation Boosted: Scaling General Reasoner to 1 Trillion Open Language Foundation Paper • 2510.22115 • Published 13 days ago • 75
Rank-GRPO: Training LLM-based Conversational Recommender Systems with Reinforcement Learning Paper • 2510.20150 • Published 15 days ago • 3
Limits of Generalization in RLVR: Two Case Studies in Mathematical Reasoning Paper • 2510.27044 • Published 7 days ago • 4
ThinkMorph: Emergent Properties in Multimodal Interleaved Chain-of-Thought Reasoning Paper • 2510.27492 • Published 7 days ago • 73
Spatial-SSRL: Enhancing Spatial Understanding via Self-Supervised Reinforcement Learning Paper • 2510.27606 • Published 6 days ago • 25
π_RL: Online RL Fine-tuning for Flow-based Vision-Language-Action Models Paper • 2510.25889 • Published 8 days ago • 56
OS-Sentinel: Towards Safety-Enhanced Mobile GUI Agents via Hybrid Validation in Realistic Workflows Paper • 2510.24411 • Published 9 days ago • 69