Spaces:
Running
A newer version of the Gradio SDK is available:
5.34.2
title: TextLens - AI-Powered OCR
emoji: π
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 4.0.0
app_file: app.py
pinned: false
license: mit
π TextLens - AI-Powered OCR
A state-of-the-art Vision-Language Model (VLM) based OCR application that extracts text from images using Microsoft Florence-2 with intelligent fallback systems and enterprise-grade zero downtime deployment.
π Live Demo
π Try it now: https://huggingface.co/spaces/GoConqurer/textlens-ocr
β¨ Key Features
π€ Advanced AI-Powered OCR
- Microsoft Florence-2 VLM: State-of-the-art vision-language model for text extraction
- Intelligent Fallback System: Automatic fallback to EasyOCR if primary model fails
- Multi-Model Support: Florence-2-base and Florence-2-large variants
- Real-time Processing: Instant text extraction on image upload
π¨ Modern User Experience
- Clean UI: Professional Gradio interface with intuitive design
- Multiple Input Methods: Upload files, use webcam, or paste from clipboard
- Copy-to-Clipboard: One-click text copying functionality
- Responsive Design: Works seamlessly on desktop and mobile devices
- Dark/Light Theme: Automatic theme adaptation
β‘ Performance & Reliability
- GPU Acceleration: Supports CUDA, MPS (Apple Silicon), and CPU inference
- Smart Device Detection: Automatically uses best available hardware
- Error Resilience: Robust error handling with graceful degradation
- Memory Optimization: Efficient model loading and cleanup
π‘οΈ Enterprise Features
- Zero Downtime Deployment: Blue-green deployment with health checks
- Health Monitoring: Built-in
/health
and/ready
endpoints - Graceful Shutdown: Signal handling for clean application restarts
- Production Ready: Scalable architecture with automated deployment
π Quick Start
π Online (Recommended)
Instant access - No installation required: π Launch TextLens
π» Local Development
Clone Repository
git clone https://github.com/KumarAmrit30/textlens-ocr.git cd textlens-ocr
Setup Environment
python -m venv textlens_env source textlens_env/bin/activate # Windows: textlens_env\Scripts\activate pip install -r requirements.txt
Launch Application
python app.py
π Open:
http://localhost:7860
π§ͺ Quick Test
# Verify installation
python -c "from models.ocr_processor import OCRProcessor; print('β
TextLens ready!')"
π Model Performance
Model | Size | Speed | Accuracy | Best For |
---|---|---|---|---|
Florence-2-base | 270M | β‘ Fast | π High | General OCR, Real-time |
Florence-2-large | 770M | π Medium | π Very High | High accuracy needs |
EasyOCR | ~100M | π Medium | π Good | Fallback, Multilingual |
π― Supported Use Cases
Category | Examples | Performance |
---|---|---|
π Documents | PDFs, Scanned papers, Forms | βββββ |
π§Ύ Receipts | Shopping receipts, Invoices | ββββ |
π± Screenshots | App interfaces, Error messages | βββββ |
π Vehicle | License plates, VIN numbers | ββββ |
π Books | Printed text, Handwritten notes | ββββ |
π Multilingual | Multiple languages | βββ |
π§ Configuration
ποΈ Model Selection
from models.ocr_processor import OCRProcessor
# Fast inference (recommended)
ocr = OCRProcessor(model_name="microsoft/Florence-2-base")
# Maximum accuracy
ocr = OCRProcessor(model_name="microsoft/Florence-2-large")
π¨ UI Customization
Modify ui/styles.py
to customize appearance:
# Change color scheme
PRIMARY_COLOR = "#1f77b4"
SECONDARY_COLOR = "#ff7f0e"
# Update layout
INTERFACE_WIDTH = "100%"
βοΈ Environment Variables
Variable | Description | Default |
---|---|---|
SPACE_ID |
HuggingFace Space ID | Auto-detected |
DEPLOYMENT_STAGE |
deployment stage | production |
TRANSFORMERS_CACHE |
Model cache path | ~/.cache/huggingface |
CUDA_VISIBLE_DEVICES |
GPU selection | All available |
Deployment Flow:
graph LR
A[Code Push] --> B[Validate]
B --> C[Deploy Staging]
C --> D[Health Check]
D --> E[Deploy Production]
E --> F[Verify]
F --> G[Complete β
]
π€ Contributing
We welcome contributions! Here's how to get started:
π§ Development Setup
Fork & Clone
git clone https://github.com/YOUR_USERNAME/textlens-ocr.git cd textlens-ocr
Create Branch
git checkout -b feature/your-feature-name
Make Changes
- Add new features or fix bugs
- Update tests and documentation
- Follow code style guidelines
Test Changes
python -m pytest tests/ python -c "from models.ocr_processor import OCRProcessor; OCRProcessor()"
Submit PR
git add . git commit -m "feat: add your feature description" git push origin feature/your-feature-name
π Contribution Guidelines
- Code Style: Follow PEP 8, use Black formatter
- Documentation: Update README and docstrings
- Tests: Add tests for new functionality
- Commits: Use conventional commit messages
- Issues: Link PRs to relevant issues
π License
This project is licensed under the MIT License - see the LICENSE file for details.
π Third-Party Licenses
- Microsoft Florence-2: MIT License
- HuggingFace Transformers: Apache License 2.0
- Gradio: Apache License 2.0
- EasyOCR: Apache License 2.0
π Acknowledgments
Special thanks to:
- Microsoft Research for the incredible Florence-2 vision-language model
- HuggingFace for the transformers library and Spaces platform
- Gradio Team for the amazing web interface framework
- JaidedAI for EasyOCR fallback capabilities
- Open Source Community for continuous support and contributions
π Project Status
Component | Status | Version |
---|---|---|
Core OCR | β Stable | v1.0.0 |
Web UI | β Stable | v1.0.0 |
Deployment | β Production | v1.0.0 |
API | β Stable | v1.0.0 |
Documentation | β Complete | v1.0.0 |
π Stats
Made with β€οΈ for the AI community