How to calculate video embedding?
#24
by
leaf-potato
- opened
I see that the Qwen-VL model supports video understanding, but the gme model seems to only support text and images. I would like to ask if there are plans to support video embedding?