Spaces:
Running
Running
A newer version of the Gradio SDK is available:
5.38.1
metadata
title: Menu Text Detection
emoji: π
colorFrom: red
colorTo: red
sdk: gradio
python_version: 3.11
short_description: Extract structured menu information from images into JSON...
tags:
- donut
- fine-tuning
- image-to-text
- transformer
license: mit
Menu Text Detection System
Extract structured menu information from images into JSON using a fine-tuned E2E model or LLM.
https://github.com/user-attachments/assets/80e5d54c-f2c8-4593-ad9b-499e5b71d8f6
π Features
Overview
Currently supports the following information from menu images:
- Restaurant Name
- Business Hours
- Address
- Phone Number
- Dish Information
- Name
- Price
For the JSON schema, see tools directory.
Supported Methods to Extract Menu Information
Fine-tuned E2E model and Training metrics
- Donut (Document Parsing Task) - Base model by Clova AI (ECCV β22)
LLM Function Calling
- Google Gemini API
- OpenAI GPT API
π» Training / Fine-Tuning
Setup
Use uv to set up the development environment:
uv sync
or use
pip install -r requirements.txt
if it has any problems
Training Script (Datasets collecting, Fine-Tuning)
Please refer train.ipynb
. Use Jupyter Notebook for training:
uv run jupyter-notebook
For VSCode users, please install Jupyter extension, then select
.venv/bin/python
as your kernel.
Run Demo Locally
uv run python app.py