SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics Paper β’ 2506.01844 β’ Published 24 days ago β’ 100
view article Article AI Policy @π€: Response to the 2025 National AI R&D Strategic Plan By evijit and 2 others β’ 24 days ago β’ 13
view article Article How to generate text: using different decoding methods for language generation with Transformers By patrickvonplaten β’ Mar 1, 2020 β’ 218
view post Post 585 Why do people sleep on DSE multimodal retrieval models? πThey're just like ColPali, but highly scalable, fast and you can even make them more efficient with binarization or matryoshka with little degradation πͺI made a small collection of them so you can get started merve/multimodal-dse-retrievers-67fe71a9c8f1ad26a48859c3Image taken from MCDSE blog https://huggingface.co/blog/marco/announcing-mcdse-2b-v1 See translation π€ 1 1 + Reply
SmolVLM: Redefining small and efficient multimodal models Paper β’ 2504.05299 β’ Published Apr 7 β’ 191
Qwen2.5-VL Collection Vision-language model series based on Qwen2.5 β’ 11 items β’ Updated Apr 28 β’ 496
view post Post 2401 we have a leaderboard for video LLMs, and most of the top models are open ones! opencompass/openvlm_video_leaderboard ππwe are so back π₯ π₯ 9 9 + Reply