In-Video Instructions: Visual Signals as Generative Control Paper • 2511.19401 • Published 7 days ago • 28
In-Video Instructions: Visual Signals as Generative Control Paper • 2511.19401 • Published 7 days ago • 28
In-Video Instructions: Visual Signals as Generative Control Paper • 2511.19401 • Published 7 days ago • 28 • 2
MiroThinker: Pushing the Performance Boundaries of Open-Source Research Agents via Model, Context, and Interactive Scaling Paper • 2511.11793 • Published 17 days ago • 156
SparseD: Sparse Attention for Diffusion Language Models Paper • 2509.24014 • Published Sep 28 • 30
Discrete Diffusion in Large Language and Multimodal Models: A Survey Paper • 2506.13759 • Published Jun 16 • 43
Can MLLMs Guide Me Home? A Benchmark Study on Fine-Grained Visual Reasoning from Transit Maps Paper • 2505.18675 • Published May 24 • 25
VeriThinker: Learning to Verify Makes Reasoning Model Efficient Paper • 2505.17941 • Published May 23 • 25
Dimple: Discrete Diffusion Multimodal Large Language Model with Parallel Decoding Paper • 2505.16990 • Published May 22 • 22
dKV-Cache: The Cache for Diffusion Language Models Paper • 2505.15781 • Published May 21 • 16 • 2
Introducing Visual Perception Token into Multimodal Large Language Model Paper • 2502.17425 • Published Feb 24 • 16
Introducing Visual Perception Token into Multimodal Large Language Model Paper • 2502.17425 • Published Feb 24 • 16
CoT-Valve: Length-Compressible Chain-of-Thought Tuning Paper • 2502.09601 • Published Feb 13 • 14
CoT-Valve: Length-Compressible Chain-of-Thought Tuning Paper • 2502.09601 • Published Feb 13 • 14
CoT-Valve: Length-Compressible Chain-of-Thought Tuning Paper • 2502.09601 • Published Feb 13 • 14 • 2