AutoMind: Adaptive Knowledgeable Agent for Automated Data Science Paper • 2506.10974 • Published 14 days ago • 18
Scientists' First Exam: Probing Cognitive Abilities of MLLM via Perception, Understanding, and Reasoning Paper • 2506.10521 • Published 14 days ago • 65
MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention Paper • 2506.13585 • Published 10 days ago • 236
Surfer-H Meets Holo1: Cost-Efficient Web Agent Powered by Open Weights Paper • 2506.02865 • Published 23 days ago • 30
FusionAudio-1.2M: Towards Fine-grained Audio Captioning with Multimodal Contextual Fusion Paper • 2506.01111 • Published 25 days ago • 29
IVY-FAKE: A Unified Explainable Framework and Benchmark for Image and Video AIGC Detection Paper • 2506.00979 • Published 25 days ago • 13
Shifting AI Efficiency From Model-Centric to Data-Centric Compression Paper • 2505.19147 • Published May 25 • 145
Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities Paper • 2505.02567 • Published May 5 • 76
M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models Paper • 2504.10449 • Published Apr 14 • 12
The AI Scientist-v2: Workshop-Level Automated Scientific Discovery via Agentic Tree Search Paper • 2504.08066 • Published Apr 10 • 14
VisuoThink: Empowering LVLM Reasoning with Multimodal Tree Search Paper • 2504.09130 • Published Apr 12 • 12
TinyLLaVA-Video-R1: Towards Smaller LMMs for Video Reasoning Paper • 2504.09641 • Published Apr 13 • 16
Breaking the Data Barrier -- Building GUI Agents Through Task Generalization Paper • 2504.10127 • Published Apr 14 • 17
Mavors: Multi-granularity Video Representation for Multimodal Large Language Model Paper • 2504.10068 • Published Apr 14 • 30
Have we unified image generation and understanding yet? An empirical study of GPT-4o's image generation ability Paper • 2504.08003 • Published Apr 9 • 49
VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning Paper • 2504.08837 • Published Apr 10 • 43