metadata

title: Menu Text Detection
emoji: 🐠
colorFrom: red
colorTo: red
sdk: gradio
python_version: 3.11
short_description: Extract structured menu information from images into JSON...
tags:
  - donut
  - fine-tuning
  - image-to-text
  - transformer
license: mit

Menu Text Detection System

Extract structured menu information from images into JSON using a fine-tuned E2E model or LLM.

https://github.com/user-attachments/assets/80e5d54c-f2c8-4593-ad9b-499e5b71d8f6

🚀 Features

Overview

Currently supports the following information from menu images:

Restaurant Name
Business Hours
Address
Phone Number
Dish Information
- Name
- Price

For the JSON schema, see tools directory.

Supported Methods to Extract Menu Information

Fine-tuned E2E model and Training metrics

Donut (Document Parsing Task) - Base model by Clova AI (ECCV ’22)

LLM Function Calling

Google Gemini API
OpenAI GPT API

💻 Training / Fine-Tuning

Setup

Use uv to set up the development environment:

uv sync

or use pip install -r requirements.txt if it has any problems

Training Script (Datasets collecting, Fine-Tuning)

Please refer train.ipynb. Use Jupyter Notebook for training:

uv run jupyter-notebook

For VSCode users, please install Jupyter extension, then select .venv/bin/python as your kernel.

Run Demo Locally

uv run python app.py