--- title: Menu Text Detection emoji: 🐠 colorFrom: red colorTo: red sdk: gradio python_version: 3.11 short_description: Extract structured menu information from images into JSON... tags: - donut - fine-tuning - image-to-text - transformer license: mit --- # Menu Text Detection System Extract structured menu information from images into JSON using a fine-tuned E2E model or LLM. [![Gradio Space Demo](https://img.shields.io/badge/GradioSpace-Demo-important?logo=huggingface)](https://huggingface.co/spaces/ryanlinjui/menu-text-detection) [![Hugging Face Models & Datasets](https://img.shields.io/badge/HuggingFace-Models_&_Datasets-important?logo=huggingface)](https://huggingface.co/collections/ryanlinjui/menu-text-detection-670ccf527626bb004bbfb39b) https://github.com/user-attachments/assets/80e5d54c-f2c8-4593-ad9b-499e5b71d8f6 ## 🚀 Features ### Overview Currently supports the following information from menu images: - **Restaurant Name** - **Business Hours** - **Address** - **Phone Number** - **Dish Information** - Name - Price > For the JSON schema, see [tools directory](./tools). ### Supported Methods to Extract Menu Information #### Fine-tuned E2E model and Training metrics - [**Donut (Document Parsing Task)**](https://huggingface.co/ryanlinjui/donut-base-finetuned-menu) - Base model by [Clova AI (ECCV ’22)](https://github.com/clovaai/donut) #### LLM Function Calling - Google Gemini API - OpenAI GPT API ## 💻 Training / Fine-Tuning ### Setup Use [uv](https://github.com/astral-sh/uv) to set up the development environment: ```bash uv sync ``` > or use `pip install -r requirements.txt` if it has any problems ### Training Script (Datasets collecting, Fine-Tuning) Please refer [`train.ipynb`](./train.ipynb). Use Jupyter Notebook for training: ```bash uv run jupyter-notebook ``` > For VSCode users, please install Jupyter extension, then select `.venv/bin/python` as your kernel. ### Run Demo Locally ```bash uv run python app.py ```