These are the models that we have used with our first version.
-
google/owlv2-base-patch16-ensemble
Zero-Shot Object Detection • 0.2B • Updated • 1.01M • 121 -
Qwen/Qwen1.5-0.5B-Chat
Text Generation • 0.6B • Updated • 85.9k • • 94 -
Salesforce/blip-image-captioning-base
Image-to-Text • Updated • 2.53M • 855 -
facebook/sam-vit-base
Mask Generation • 93.7M • Updated • 132k • 168