view post Post 1404 Yesterday was the day of vision language action models (VLAs)!> SmolVLA: open-source small VLA for robotics by Hugging Face LeRobot team 🤖Blog: https://huggingface.co/blog/smolvlaModel: lerobot/smolvla_base> Holo-1: 3B & 7B web/computer use agentic VLAs by H Company 💻 Model family: Hcompany/holo1-683dd1eece7eb077b96d0cbd Demo: https://huggingface.co/spaces/multimodalart/Holo1Blog: https://huggingface.co/blog/Hcompany/holo1super exciting times!! See translation 🚀 5 5 + Reply
ARIA: Training Language Agents with Intention-Driven Reward Aggregation Paper • 2506.00539 • Published 26 days ago • 30
VideoReasonBench: Can MLLMs Perform Vision-Centric Complex Video Reasoning? Paper • 2505.23359 • Published 28 days ago • 39