DAMO-NLP-SG
/

VideoRefer-VideoLLaMA3-2B

Video-Text-to-Text

videollama3_qwen2

text-generation

multimodal large language model

large video-language model

Model card Files Files and versions

CircleRadon commited on Jun 19

Commit

7e3d630

·

verified ·

1 Parent(s): 7215ac2

Update README.md

Files changed (1) hide show

README.md +4 -0

README.md CHANGED Viewed

@@ -9,6 +9,8 @@ pipeline_tag: visual-question-answering
 tags:
 - multimodal large language model
 - large video-language model
 ---
@@ -27,11 +29,13 @@ VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM
 <div style="display: flex; justify-content: center; margin-top: 10px;">
 <a href="https://arxiv.org/pdf/2501.00599"><img src="https://img.shields.io/badge/Arxiv-2501.00599-ECA8A7" style="margin-right: 5px;"></a>
 <a href="https://github.com/DAMO-NLP-SG/VideoRefer"><img src='https://img.shields.io/badge/Github-VideoRefer-F7C97E' style="margin-right: 5px;"></a>
 <a href="https://github.com/DAMO-NLP-SG/VideoLLaMA3"><img src='https://img.shields.io/badge/Github-VideoLLaMA3-9DC3E6' style="margin-right: 5px;"></a>
 </div>
 ## 📰 News
 * **[2025.6.18]** 🔥We release a new version of VideoRefer([VideoRefer-VideoLLaMA3-7B](https://huggingface.co/DAMO-NLP-SG/VideoRefer-VideoLLaMA3-7B) and [VideoRefer-VideoLLaMA3-2B](https://huggingface.co/DAMO-NLP-SG/VideoRefer-VideoLLaMA3-2B)), which are trained based on [VideoLLaMA3](https://github.com/DAMO-NLP-SG/VideoLLaMA3).
 * **[2025.4.22]** 🔥Our VideoRefer-Bench has been adopted in [Describe Anything Model](https://arxiv.org/pdf/2504.16072) (NVIDIA & UC Berkeley).
 * **[2025.2.27]** 🔥VideoRefer Suite has been accepted to CVPR2025!

 tags:
 - multimodal large language model
 - large video-language model
+base_model:
+  - DAMO-NLP-SG/VideoLLaMA3-2B-Image
 ---
 <div style="display: flex; justify-content: center; margin-top: 10px;">
 <a href="https://arxiv.org/pdf/2501.00599"><img src="https://img.shields.io/badge/Arxiv-2501.00599-ECA8A7" style="margin-right: 5px;"></a>
+<a href="https://huggingface.co/spaces/lixin4ever/VideoRefer-VideoLLaMA3"><img src='https://img.shields.io/badge/HuggingFace-Demo-96D03A' style="margin-right: 5px;"></a>
 <a href="https://github.com/DAMO-NLP-SG/VideoRefer"><img src='https://img.shields.io/badge/Github-VideoRefer-F7C97E' style="margin-right: 5px;"></a>
 <a href="https://github.com/DAMO-NLP-SG/VideoLLaMA3"><img src='https://img.shields.io/badge/Github-VideoLLaMA3-9DC3E6' style="margin-right: 5px;"></a>
 </div>
 ## 📰 News
+* **[2025.6.19]** 🔥We release the [demo](https://huggingface.co/spaces/lixin4ever/VideoRefer-VideoLLaMA3) of VideoRefer-VideoLLaMA3, hosted on HuggingFace. Feel free to try it!
 * **[2025.6.18]** 🔥We release a new version of VideoRefer([VideoRefer-VideoLLaMA3-7B](https://huggingface.co/DAMO-NLP-SG/VideoRefer-VideoLLaMA3-7B) and [VideoRefer-VideoLLaMA3-2B](https://huggingface.co/DAMO-NLP-SG/VideoRefer-VideoLLaMA3-2B)), which are trained based on [VideoLLaMA3](https://github.com/DAMO-NLP-SG/VideoLLaMA3).
 * **[2025.4.22]** 🔥Our VideoRefer-Bench has been adopted in [Describe Anything Model](https://arxiv.org/pdf/2504.16072) (NVIDIA & UC Berkeley).
 * **[2025.2.27]** 🔥VideoRefer Suite has been accepted to CVPR2025!