Agent S: An Open Agentic Framework that Uses Computers Like a Human Paper • 2410.08164 • Published Oct 10, 2024 • 25
Agent-R: Training Language Model Agents to Reflect via Iterative Self-Training Paper • 2501.11425 • Published Jan 20 • 107
MatAnyone: Stable Video Matting with Consistent Memory Propagation Paper • 2501.14677 • Published Jan 24 • 35
Qwen2.5-VL Collection Vision-language model series based on Qwen2.5 • 11 items • Updated 15 days ago • 522
Qwen2.5-1M Collection The long-context version of Qwen2.5, supporting 1M-token context lengths • 3 items • Updated 15 days ago • 120
UI-TARS: Pioneering Automated GUI Interaction with Native Agents Paper • 2501.12326 • Published Jan 21 • 62
DeepSeek R1 (All Versions) Collection DeepSeek-R1-0528 is here! The most powerful reasoning open LLM, available in GGUF, original & 4-bit formats. Includes Llama & Qwen distilled models. • 37 items • Updated 7 days ago • 254
Textoon: Generating Vivid 2D Cartoon Characters from Text Descriptions Paper • 2501.10020 • Published Jan 17 • 24
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs Paper • 2412.18925 • Published Dec 25, 2024 • 105
AIMv2 Collection A collection of AIMv2 vision encoders that supports a number of resolutions, native resolution, and a distilled checkpoint. • 19 items • Updated Nov 22, 2024 • 81
OS-ATLAS: A Foundation Action Model for Generalist GUI Agents Paper • 2410.23218 • Published Oct 30, 2024 • 51
NVLM 1.0 Collection A family of frontier-class multimodal large language models (LLMs) that achieve state-of-the-art results on vision-language tasks and text-only tasks. • 2 items • Updated 15 days ago • 51
Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders Paper • 2408.15998 • Published Aug 28, 2024 • 88
MeshFormer: High-Quality Mesh Generation with 3D-Guided Reconstruction Model Paper • 2408.10198 • Published Aug 19, 2024 • 36