Extract text from images and XML files using OCR models
VideoRefer x VideoLLaMA3
nanonets ocr / typhoon ocr / smoldocling / monkey ocr
circle to erase