Running on Zero 123 123 Chat with Kimi-VL-A3B-Thinking-2506 π€ Chat with Kimi-VL-A3B-Thinking using text, images, and videos
view article Article π€ππ¬π₯οΈπ Kimi-VL-A3B-Thinking-2506: A Quick Navigation By moonshotai and 1 other β’ 4 days ago β’ 49
VDT: General-purpose Video Diffusion Transformers via Mask Modeling Paper β’ 2305.13311 β’ Published May 22, 2023
WenLan: Bridging Vision and Language by Large-Scale Multi-Modal Pre-Training Paper β’ 2103.06561 β’ Published Mar 11, 2021
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism Paper β’ 2401.02954 β’ Published Jan 5, 2024 β’ 49
DeepSeek-VL: Towards Real-World Vision-Language Understanding Paper β’ 2403.05525 β’ Published Mar 8, 2024 β’ 47
UniAdapter: Unified Parameter-Efficient Transfer Learning for Cross-modal Modeling Paper β’ 2302.06605 β’ Published Feb 13, 2023
Needle In A Video Haystack: A Scalable Synthetic Framework for Benchmarking Video MLLMs Paper β’ 2406.09367 β’ Published Jun 13, 2024
Beyond Filtering: Adaptive Image-Text Quality Enhancement for MLLM Pretraining Paper β’ 2410.16166 β’ Published Oct 21, 2024
Kimi k1.5: Scaling Reinforcement Learning with LLMs Paper β’ 2501.12599 β’ Published Jan 22 β’ 119
R1-Onevision: Advancing Generalized Multimodal Reasoning through Cross-Modal Formalization Paper β’ 2503.10615 β’ Published Mar 13 β’ 17
VideoReasonBench: Can MLLMs Perform Vision-Centric Complex Video Reasoning? Paper β’ 2505.23359 β’ Published 28 days ago β’ 39