Spaces:
Running
Running
title: Menu Text Detection | |
emoji: π | |
colorFrom: red | |
colorTo: red | |
sdk: gradio | |
python_version: 3.11 | |
short_description: Extract structured menu information from images into JSON... | |
tags: | |
- donut | |
- fine-tuning | |
- image-to-text | |
- transformer | |
license: mit | |
# Menu Text Detection System | |
Extract structured menu information from images into JSON using a fine-tuned E2E model or LLM. | |
[](https://huggingface.co/spaces/ryanlinjui/menu-text-detection) | |
[](https://huggingface.co/collections/ryanlinjui/menu-text-detection-670ccf527626bb004bbfb39b) | |
https://github.com/user-attachments/assets/80e5d54c-f2c8-4593-ad9b-499e5b71d8f6 | |
## π Features | |
### Overview | |
Currently supports the following information from menu images: | |
- **Restaurant Name** | |
- **Business Hours** | |
- **Address** | |
- **Phone Number** | |
- **Dish Information** | |
- Name | |
- Price | |
> For the JSON schema, see [tools directory](./tools). | |
### Supported Methods to Extract Menu Information | |
#### Fine-tuned E2E model and Training metrics | |
- [**Donut (Document Parsing Task)**](https://huggingface.co/ryanlinjui/donut-base-finetuned-menu) - Base model by [Clova AI (ECCV β22)](https://github.com/clovaai/donut) | |
#### LLM Function Calling | |
- Google Gemini API | |
- OpenAI GPT API | |
## π» Training / Fine-Tuning | |
### Setup | |
Use [uv](https://github.com/astral-sh/uv) to set up the development environment: | |
```bash | |
uv sync | |
``` | |
> or use `pip install -r requirements.txt` if it has any problems | |
### Training Script (Datasets collecting, Fine-Tuning) | |
Please refer [`train.ipynb`](./train.ipynb). Use Jupyter Notebook for training: | |
```bash | |
uv run jupyter-notebook | |
``` | |
> For VSCode users, please install Jupyter extension, then select `.venv/bin/python` as your kernel. | |
### Run Demo Locally | |
```bash | |
uv run python app.py | |
``` |