imageGenerator / README.md
gg3554's picture
Update README.md
7b6843e verified

A newer version of the Gradio SDK is available: 6.1.0

Upgrade
metadata
title: Text to Image Generator
emoji: 🎨
colorFrom: purple
colorTo: pink
sdk: gradio
sdk_version: 6.0.1
app_file: app.py
pinned: false
license: mit

ImageGenerator

GEMINI 2.5 PRO PROMPT ENHANCEMENT
Original prompt:  a dog in front of a desk
Enhanced prompt:  A photorealistic, eye-level shot of a Golden Retriever sitting by a wooden desk. Soft, warm morning light streams from a window, creating gentle highlights and deep shadows. The background is softly blurred, focusing on the dog's detailed fur. A cozy, cinematic atmosphere with an earthy color palette.
Word count: 47
Processing sample 1/1: A photorealistic, eye-level shot of a Golden Retriever sitting by a wooden desk. Soft, warm morning light streams from a window, creating gentle highlights and deep shadows. The background is softly blurred, focusing on the dog's detailed fur. A cozy, cinematic atmosphere with an earthy color palette.
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 30/30 [00:05<00:00,  5.08it/s]
  Saved as: generated_image_0.png
============================================================
DETAILED SAMPLE SCORES
============================================================
Average CLIPScore: 0.2753
Average GenEval Score: 0.6883
Inception Score: 1.0000 Β± 0.0000
Inception Entropy: 7.5517
image

Text to Image Generation Web App

A Flask-based web application that generates images from text prompts using Stable Diffusion and evaluates them using CLIP scores.

Features

  • 🎨 Generate images from text descriptions
  • πŸ“Š CLIP score evaluation for image-text alignment
  • πŸ’¬ Chat-style interface
  • πŸŒ™ Modern dark theme UI

Requirements

  • Python 3.8+
  • CUDA-capable GPU (recommended) or CPU
  • ~10GB disk space for models

Installation

  1. Clone or download the project files

  2. Create a virtual environment (recommended)

    python -m venv venv
    source venv/bin/activate  # Linux/Mac
    # or
    venv\Scripts\activate  # Windows
    
  3. Install dependencies

    pip install -r requirements.txt
    

Running the Application

  1. Start the Flask server

    python app.py
    
  2. Open your browser and go to:

    http://localhost:5000
    
  3. Enter a prompt and click "Generate" to create an image!

Project Structure

project/
β”œβ”€β”€ app.py              # Main Flask application
β”œβ”€β”€ templates/
β”‚   └── index.html      # Web interface
β”œβ”€β”€ static/
β”‚   └── images/         # Static assets (optional)
β”œβ”€β”€ requirements.txt    # Python dependencies
└── README.md          # This file

Notes

  • First run: The application will download required AI models (~5-7GB). This happens only once.
  • Generation time: Each image takes 30-60 seconds on GPU, longer on CPU.
  • Memory: Requires ~8GB VRAM (GPU) or ~16GB RAM (CPU).

Troubleshooting

Out of Memory

If you get CUDA out of memory errors, try:

  • Closing other GPU applications
  • Reducing num_inference_steps in app.py
  • Using CPU mode (slower but works with less memory)

Slow Generation

  • CPU mode is significantly slower than GPU
  • Consider using a cloud GPU service for better performance

API Endpoints

  • GET / - Web interface
  • POST / - Generate image (form data: prompt)
  • GET /health - Health check

Scores Explained

  • CLIP Score: Measures how well the image matches the text (0-1, higher is better)
  • GenEval Score: Derived metric (CLIP Γ— 2.5) for easier interpretation