Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2502.13923

Vision-language model series based on Qwen2.5

Running

141

141

Qwen2.5 VL 32B Instruct Demo

🏃

Chat with images and videos using Qwen
Running

259

259

Qwen2.5 VL 72B Instruct

💻

Interact with a multimodal chatbot using text and images
Qwen2.5-VL Technical Report

Paper • 2502.13923 • Published Feb 19 • 200
Qwen/Qwen2.5-VL-32B-Instruct

Image-Text-to-Text • 33B • Updated Apr 14 • 496k • • 417

Research on GUI Models

Qwen2.5-VL Technical Report

Paper • 2502.13923 • Published Feb 19 • 200
Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs

Paper • 2404.05719 • Published Apr 8, 2024 • 83
ShowUI: One Vision-Language-Action Model for GUI Visual Agent

Paper • 2411.17465 • Published Nov 26, 2024 • 89
Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution

Paper • 2409.12191 • Published Sep 18, 2024 • 78

about 19 hours ago

DocLLM: A layout-aware generative language model for multimodal document understanding

Paper • 2401.00908 • Published Dec 31, 2023 • 189
COSMO: COntrastive Streamlined MultimOdal Model with Interleaved Pre-Training

Paper • 2401.00849 • Published Jan 1, 2024 • 17
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents

Paper • 2311.05437 • Published Nov 9, 2023 • 51
LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation, Generation and Editing

Paper • 2311.00571 • Published Nov 1, 2023 • 43

Qwen2.5-VL Technical Report

Paper • 2502.13923 • Published Feb 19 • 200
Gemma 3 Technical Report

Paper • 2503.19786 • Published Mar 25 • 53

Large Language Diffusion Models

Paper • 2502.09992 • Published Feb 14 • 122
MM-RLHF: The Next Step Forward in Multimodal LLM Alignment

Paper • 2502.10391 • Published Feb 14 • 35
Diverse Inference and Verification for Advanced Reasoning

Paper • 2502.09955 • Published Feb 14 • 18
Selective Self-to-Supervised Fine-Tuning for Generalization in Large Language Models

Paper • 2502.08130 • Published Feb 12 • 9

about 4 hours ago

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Paper • 2402.04252 • Published Feb 6, 2024 • 29
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

Paper • 2402.03749 • Published Feb 6, 2024 • 13
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 44
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

Paper • 2402.05008 • Published Feb 7, 2024 • 23

AI Paper of the Day

A collection of papers that I think are interesting, one added each day

about 11 hours ago

Can Large Language Models Understand Context?

Paper • 2402.00858 • Published Feb 1, 2024 • 24
OLMo: Accelerating the Science of Language Models

Paper • 2402.00838 • Published Feb 1, 2024 • 84
Self-Rewarding Language Models

Paper • 2401.10020 • Published Jan 18, 2024 • 152
SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity

Paper • 2401.17072 • Published Jan 30, 2024 • 24

Vision Language Models: 2025 Update

This collection includes all the models, datasets and Spaces mentioned in the blog Vision Language Models: 2025 Update

Qwen/Qwen2.5-Omni-7B

Any-to-Any • 11B • Updated Apr 30 • 120k • 1.74k
Running

334

334

Qwen2.5 Omni 7B Demo

🏆

Generate text and speech responses from various inputs
Qwen2.5-Omni Technical Report

Paper • 2503.20215 • Published Mar 26 • 165
openbmb/MiniCPM-o-2_6

Any-to-Any • 9B • Updated Jun 20 • 125k • 1.21k

R1-Onevision: Advancing Generalized Multimodal Reasoning through Cross-Modal Formalization

Paper • 2503.10615 • Published Mar 13 • 17
UniGoal: Towards Universal Zero-shot Goal-oriented Navigation

Paper • 2503.10630 • Published Mar 13 • 6
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning

Paper • 2503.09516 • Published Mar 12 • 35
LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL

Paper • 2503.07536 • Published Mar 10 • 88

Qwen2.5-VL Technical Report

Paper • 2502.13923 • Published Feb 19 • 200
Scaling Laws for Native Multimodal Models Scaling Laws for Native Multimodal Models

Paper • 2504.07951 • Published Apr 10 • 29

Vision-language model series based on Qwen2.5

Running

141

141

Qwen2.5 VL 32B Instruct Demo

🏃

Chat with images and videos using Qwen
Running

259

259

Qwen2.5 VL 72B Instruct

💻

Interact with a multimodal chatbot using text and images
Qwen2.5-VL Technical Report

Paper • 2502.13923 • Published Feb 19 • 200
Qwen/Qwen2.5-VL-32B-Instruct

Image-Text-to-Text • 33B • Updated Apr 14 • 496k • • 417

about 4 hours ago

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Paper • 2402.04252 • Published Feb 6, 2024 • 29
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

Paper • 2402.03749 • Published Feb 6, 2024 • 13
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 44
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

Paper • 2402.05008 • Published Feb 7, 2024 • 23

Research on GUI Models

Qwen2.5-VL Technical Report

Paper • 2502.13923 • Published Feb 19 • 200
Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs

Paper • 2404.05719 • Published Apr 8, 2024 • 83
ShowUI: One Vision-Language-Action Model for GUI Visual Agent

Paper • 2411.17465 • Published Nov 26, 2024 • 89
Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution

Paper • 2409.12191 • Published Sep 18, 2024 • 78

AI Paper of the Day

A collection of papers that I think are interesting, one added each day

about 11 hours ago

Can Large Language Models Understand Context?

Paper • 2402.00858 • Published Feb 1, 2024 • 24
OLMo: Accelerating the Science of Language Models

Paper • 2402.00838 • Published Feb 1, 2024 • 84
Self-Rewarding Language Models

Paper • 2401.10020 • Published Jan 18, 2024 • 152
SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity

Paper • 2401.17072 • Published Jan 30, 2024 • 24

about 19 hours ago

DocLLM: A layout-aware generative language model for multimodal document understanding

Paper • 2401.00908 • Published Dec 31, 2023 • 189
COSMO: COntrastive Streamlined MultimOdal Model with Interleaved Pre-Training

Paper • 2401.00849 • Published Jan 1, 2024 • 17
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents

Paper • 2311.05437 • Published Nov 9, 2023 • 51
LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation, Generation and Editing

Paper • 2311.00571 • Published Nov 1, 2023 • 43

Vision Language Models: 2025 Update

This collection includes all the models, datasets and Spaces mentioned in the blog Vision Language Models: 2025 Update

Qwen/Qwen2.5-Omni-7B

Any-to-Any • 11B • Updated Apr 30 • 120k • 1.74k
Running

334

334

Qwen2.5 Omni 7B Demo

🏆

Generate text and speech responses from various inputs
Qwen2.5-Omni Technical Report

Paper • 2503.20215 • Published Mar 26 • 165
openbmb/MiniCPM-o-2_6

Any-to-Any • 9B • Updated Jun 20 • 125k • 1.21k

Qwen2.5-VL Technical Report

Paper • 2502.13923 • Published Feb 19 • 200
Gemma 3 Technical Report

Paper • 2503.19786 • Published Mar 25 • 53

R1-Onevision: Advancing Generalized Multimodal Reasoning through Cross-Modal Formalization

Paper • 2503.10615 • Published Mar 13 • 17
UniGoal: Towards Universal Zero-shot Goal-oriented Navigation

Paper • 2503.10630 • Published Mar 13 • 6
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning

Paper • 2503.09516 • Published Mar 12 • 35
LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL

Paper • 2503.07536 • Published Mar 10 • 88

Large Language Diffusion Models

Paper • 2502.09992 • Published Feb 14 • 122
MM-RLHF: The Next Step Forward in Multimodal LLM Alignment

Paper • 2502.10391 • Published Feb 14 • 35
Diverse Inference and Verification for Advanced Reasoning

Paper • 2502.09955 • Published Feb 14 • 18
Selective Self-to-Supervised Fine-Tuning for Generalization in Large Language Models

Paper • 2502.08130 • Published Feb 12 • 9

Qwen2.5-VL Technical Report

Paper • 2502.13923 • Published Feb 19 • 200
Scaling Laws for Native Multimodal Models Scaling Laws for Native Multimodal Models

Paper • 2504.07951 • Published Apr 10 • 29

Previous
1
2
3
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs