textlens-ocr / README.md
GoConqurer's picture
πŸ”§ Update OCR model in handlers.py and clean up README.md
760f6ef

A newer version of the Gradio SDK is available: 5.34.2

Upgrade
metadata
title: TextLens - AI-Powered OCR
emoji: πŸ”
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 4.0.0
app_file: app.py
pinned: false
license: mit

πŸ” TextLens - AI-Powered OCR

Deploy to HuggingFace GitHub Python 3.9+

A state-of-the-art Vision-Language Model (VLM) based OCR application that extracts text from images using Microsoft Florence-2 with intelligent fallback systems and enterprise-grade zero downtime deployment.

πŸš€ Live Demo

πŸ”— Try it now: https://huggingface.co/spaces/GoConqurer/textlens-ocr

TextLens Demo

✨ Key Features

πŸ€– Advanced AI-Powered OCR

  • Microsoft Florence-2 VLM: State-of-the-art vision-language model for text extraction
  • Intelligent Fallback System: Automatic fallback to EasyOCR if primary model fails
  • Multi-Model Support: Florence-2-base and Florence-2-large variants
  • Real-time Processing: Instant text extraction on image upload

🎨 Modern User Experience

  • Clean UI: Professional Gradio interface with intuitive design
  • Multiple Input Methods: Upload files, use webcam, or paste from clipboard
  • Copy-to-Clipboard: One-click text copying functionality
  • Responsive Design: Works seamlessly on desktop and mobile devices
  • Dark/Light Theme: Automatic theme adaptation

⚑ Performance & Reliability

  • GPU Acceleration: Supports CUDA, MPS (Apple Silicon), and CPU inference
  • Smart Device Detection: Automatically uses best available hardware
  • Error Resilience: Robust error handling with graceful degradation
  • Memory Optimization: Efficient model loading and cleanup

πŸ›‘οΈ Enterprise Features

  • Zero Downtime Deployment: Blue-green deployment with health checks
  • Health Monitoring: Built-in /health and /ready endpoints
  • Graceful Shutdown: Signal handling for clean application restarts
  • Production Ready: Scalable architecture with automated deployment

πŸš€ Quick Start

🌐 Online (Recommended)

Instant access - No installation required: πŸ‘‰ Launch TextLens

πŸ’» Local Development

  1. Clone Repository

    git clone https://github.com/KumarAmrit30/textlens-ocr.git
    cd textlens-ocr
    
  2. Setup Environment

    python -m venv textlens_env
    source textlens_env/bin/activate  # Windows: textlens_env\Scripts\activate
    pip install -r requirements.txt
    
  3. Launch Application

    python app.py
    

    🌐 Open: http://localhost:7860

πŸ§ͺ Quick Test

# Verify installation
python -c "from models.ocr_processor import OCRProcessor; print('βœ… TextLens ready!')"

πŸ“Š Model Performance

Model Size Speed Accuracy Best For
Florence-2-base 270M ⚑ Fast πŸ“ˆ High General OCR, Real-time
Florence-2-large 770M 🐌 Medium πŸ“Š Very High High accuracy needs
EasyOCR ~100M πŸš€ Medium πŸ“‹ Good Fallback, Multilingual

🎯 Supported Use Cases

Category Examples Performance
πŸ“„ Documents PDFs, Scanned papers, Forms ⭐⭐⭐⭐⭐
🧾 Receipts Shopping receipts, Invoices ⭐⭐⭐⭐
πŸ“± Screenshots App interfaces, Error messages ⭐⭐⭐⭐⭐
πŸš— Vehicle License plates, VIN numbers ⭐⭐⭐⭐
πŸ“š Books Printed text, Handwritten notes ⭐⭐⭐⭐
🌐 Multilingual Multiple languages ⭐⭐⭐

πŸ”§ Configuration

πŸŽ›οΈ Model Selection

from models.ocr_processor import OCRProcessor

# Fast inference (recommended)
ocr = OCRProcessor(model_name="microsoft/Florence-2-base")

# Maximum accuracy
ocr = OCRProcessor(model_name="microsoft/Florence-2-large")

🎨 UI Customization

Modify ui/styles.py to customize appearance:

# Change color scheme
PRIMARY_COLOR = "#1f77b4"
SECONDARY_COLOR = "#ff7f0e"

# Update layout
INTERFACE_WIDTH = "100%"

βš™οΈ Environment Variables

Variable Description Default
SPACE_ID HuggingFace Space ID Auto-detected
DEPLOYMENT_STAGE deployment stage production
TRANSFORMERS_CACHE Model cache path ~/.cache/huggingface
CUDA_VISIBLE_DEVICES GPU selection All available

Deployment Flow:

graph LR
    A[Code Push] --> B[Validate]
    B --> C[Deploy Staging]
    C --> D[Health Check]
    D --> E[Deploy Production]
    E --> F[Verify]
    F --> G[Complete βœ…]

🀝 Contributing

We welcome contributions! Here's how to get started:

πŸ”§ Development Setup

  1. Fork & Clone

    git clone https://github.com/YOUR_USERNAME/textlens-ocr.git
    cd textlens-ocr
    
  2. Create Branch

    git checkout -b feature/your-feature-name
    
  3. Make Changes

    • Add new features or fix bugs
    • Update tests and documentation
    • Follow code style guidelines
  4. Test Changes

    python -m pytest tests/
    python -c "from models.ocr_processor import OCRProcessor; OCRProcessor()"
    
  5. Submit PR

    git add .
    git commit -m "feat: add your feature description"
    git push origin feature/your-feature-name
    

πŸ“ Contribution Guidelines

  • Code Style: Follow PEP 8, use Black formatter
  • Documentation: Update README and docstrings
  • Tests: Add tests for new functionality
  • Commits: Use conventional commit messages
  • Issues: Link PRs to relevant issues

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ™ Third-Party Licenses

🌟 Acknowledgments

Special thanks to:

  • Microsoft Research for the incredible Florence-2 vision-language model
  • HuggingFace for the transformers library and Spaces platform
  • Gradio Team for the amazing web interface framework
  • JaidedAI for EasyOCR fallback capabilities
  • Open Source Community for continuous support and contributions

πŸ“ˆ Project Status

Component Status Version
Core OCR βœ… Stable v1.0.0
Web UI βœ… Stable v1.0.0
Deployment βœ… Production v1.0.0
API βœ… Stable v1.0.0
Documentation βœ… Complete v1.0.0

πŸ“Š Stats

GitHub stars GitHub forks GitHub watchers


Made with ❀️ for the AI community

⭐ Star this repo β€’ πŸ”— Try the demo β€’ πŸ“– Read docs