menu-text-detection / README.md
ryanlinjui's picture
Update README.md
31371a9 verified

A newer version of the Gradio SDK is available: 5.38.1

Upgrade
metadata
title: Menu Text Detection
emoji: 🐠
colorFrom: red
colorTo: red
sdk: gradio
python_version: 3.11
short_description: Extract structured menu information from images into JSON...
tags:
  - donut
  - fine-tuning
  - image-to-text
  - transformer
license: mit

Menu Text Detection System

Extract structured menu information from images into JSON using a fine-tuned E2E model or LLM.

Gradio Space Demo Hugging Face Models & Datasets

https://github.com/user-attachments/assets/80e5d54c-f2c8-4593-ad9b-499e5b71d8f6

πŸš€ Features

Overview

Currently supports the following information from menu images:

  • Restaurant Name
  • Business Hours
  • Address
  • Phone Number
  • Dish Information
    • Name
    • Price

For the JSON schema, see tools directory.

Supported Methods to Extract Menu Information

Fine-tuned E2E model and Training metrics

LLM Function Calling

  • Google Gemini API
  • OpenAI GPT API

πŸ’» Training / Fine-Tuning

Setup

Use uv to set up the development environment:

uv sync

or use pip install -r requirements.txt if it has any problems

Training Script (Datasets collecting, Fine-Tuning)

Please refer train.ipynb. Use Jupyter Notebook for training:

uv run jupyter-notebook

For VSCode users, please install Jupyter extension, then select .venv/bin/python as your kernel.

Run Demo Locally

uv run python app.py