By the way, would you be open to exploring the possibility of accelerating this demo using SGLang, following the guide available here? I believe it could offer some nice performance improvements.
reacted to prithivMLmods's
post with šš2 months ago
The POINTS-Reader, a vision-language model for end-to-end document conversion, is a powerful, distillation-free Vision-Language Model that sets new SoTA benchmarks. The demo is now available on HF (Extraction, Preview, Documentation). The input consists of a fixed prompt and a document image, while the output contains only a string (the text extracted from the document image). š„š¤