Why Far Looks Up: Probing Spatial Representation in Vision-Language Models Paper • 2605.30161 • Published 5 days ago • 52
From Pixels to Words -- Towards Native One-Vision Models at Scale Paper • 2605.28820 • Published 6 days ago • 68
OSP-Next: Efficient High-Quality Video Generation with Sparse Sequence Parallelism, HiF8 Quantization, and Reinforcement Learning Paper • 2605.28691 • Published 6 days ago • 21
Rethinking Memory as Continuously Evolving Connectivity Paper • 2605.28773 • Published 6 days ago • 29
ScientistOne: Towards Human-Level Autonomous Research via Chain-of-Evidence Paper • 2605.26340 • Published 8 days ago • 33
ResearchMath-14K: Scaling Research-Level Mathematics via Agents Paper • 2605.28003 • Published 6 days ago • 47
Flash-GRPO: Efficient Alignment for Video Diffusion via One-Step Policy Optimization Paper • 2605.15980 • Published 18 days ago • 36
Distilling Long-CoT Reasoning through Collaborative Step-wise Multi-Teacher Decoding Paper • 2605.02290 • Published 29 days ago • 40
LongLive-2.0: An NVFP4 Parallel Infrastructure for Long Video Generation Paper • 2605.18739 • Published 15 days ago • 112
MMSkills: Towards Multimodal Skills for General Visual Agents Paper • 2605.13527 • Published 19 days ago • 118
CiteVQA: Benchmarking Evidence Attribution for Trustworthy Document Intelligence Paper • 2605.12882 • Published 20 days ago • 269
Code-as-Room: Generating 3D Rooms from Top-Down View Images via Agentic Code Synthesis Paper • 2605.18451 • Published 15 days ago • 41
VoiceGRPO: Modern MoE Transformers with Group Relative Policy Optimization GRPO for AI Voice Health Care Applications on Voice Pathology Detection Paper • 2503.03797 • Published Mar 5, 2025 • 1
A Survey on Computational Pathology Foundation Models: Datasets, Adaptation Strategies, and Evaluation Tasks Paper • 2501.15724 • Published Jan 27, 2025 • 1