view article Article 🤔👀🎬🖥️📖 Kimi-VL-A3B-Thinking-2506: A Quick Navigation By moonshotai and 1 other • 5 days ago • 50
view article Article No GPU left behind: Unlocking Efficiency with Co-located vLLM in TRL By toslali-ibm and 5 others • 23 days ago • 58
Advancing Multimodal Reasoning: From Optimized Cold Start to Staged Reinforcement Learning Paper • 2506.04207 • Published 22 days ago • 45
SynthRL: Scaling Visual Reasoning with Verifiable Data Synthesis Paper • 2506.02096 • Published 24 days ago • 51
Beyond Distillation: Pushing the Limits of Medical LLM Reasoning with Minimalist Rule-Based RL Paper • 2505.17952 • Published May 23 • 21
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models Paper • 2504.10479 • Published Apr 14 • 274
Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling Paper • 2412.05271 • Published Dec 6, 2024 • 159
AttentionInfluence: Adopting Attention Head Influence for Weak-to-Strong Pretraining Data Selection Paper • 2505.07293 • Published May 12 • 26
MMLongBench: Benchmarking Long-Context Vision-Language Models Effectively and Thoroughly Paper • 2505.10610 • Published May 15 • 53
PixMo Collection A set of vision-language datasets built by Ai2 and used to train the Molmo family of models. Read more at https://molmo.allenai.org/blog • 10 items • Updated Apr 30 • 72
UI-R1: Enhancing Action Prediction of GUI Agents by Reinforcement Learning Paper • 2503.21620 • Published Mar 27 • 62
Creation-MMBench: Assessing Context-Aware Creative Intelligence in MLLM Paper • 2503.14478 • Published Mar 18 • 48
VisualPRM: An Effective Process Reward Model for Multimodal Reasoning Paper • 2503.10291 • Published Mar 13 • 37
view article Article Welcome Gemma 3: Google's all new multimodal, multilingual, long context open LLM By ariG23498 and 3 others • Mar 12 • 437
YuE: Scaling Open Foundation Models for Long-Form Music Generation Paper • 2503.08638 • Published Mar 11 • 66