textlens-ocr / README.md
GoConqurer's picture
πŸ”§ Update OCR model in handlers.py and clean up README.md
760f6ef
---
title: TextLens - AI-Powered OCR
emoji: πŸ”
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 4.0.0
app_file: app.py
pinned: false
license: mit
---
# πŸ” TextLens - AI-Powered OCR
[![Deploy to HuggingFace](https://img.shields.io/badge/πŸ€—-Deploy%20to%20Spaces-blue)](https://huggingface.co/spaces/GoConqurer/textlens-ocr)
[![GitHub](https://img.shields.io/badge/GitHub-Repository-green)](https://github.com/KumarAmrit30/textlens-ocr)
[![Python 3.9+](https://img.shields.io/badge/python-3.9+-blue.svg)](https://www.python.org/downloads/)
A state-of-the-art Vision-Language Model (VLM) based OCR application that extracts text from images using Microsoft Florence-2 with intelligent fallback systems and enterprise-grade zero downtime deployment.
## πŸš€ Live Demo
**πŸ”— Try it now:** [https://huggingface.co/spaces/GoConqurer/textlens-ocr](https://huggingface.co/spaces/GoConqurer/textlens-ocr)
![TextLens Demo](https://img.shields.io/badge/Demo-Live-brightgreen)
## ✨ Key Features
### πŸ€– Advanced AI-Powered OCR
- **Microsoft Florence-2 VLM**: State-of-the-art vision-language model for text extraction
- **Intelligent Fallback System**: Automatic fallback to EasyOCR if primary model fails
- **Multi-Model Support**: Florence-2-base and Florence-2-large variants
- **Real-time Processing**: Instant text extraction on image upload
### 🎨 Modern User Experience
- **Clean UI**: Professional Gradio interface with intuitive design
- **Multiple Input Methods**: Upload files, use webcam, or paste from clipboard
- **Copy-to-Clipboard**: One-click text copying functionality
- **Responsive Design**: Works seamlessly on desktop and mobile devices
- **Dark/Light Theme**: Automatic theme adaptation
### ⚑ Performance & Reliability
- **GPU Acceleration**: Supports CUDA, MPS (Apple Silicon), and CPU inference
- **Smart Device Detection**: Automatically uses best available hardware
- **Error Resilience**: Robust error handling with graceful degradation
- **Memory Optimization**: Efficient model loading and cleanup
### πŸ›‘οΈ Enterprise Features
- **Zero Downtime Deployment**: Blue-green deployment with health checks
- **Health Monitoring**: Built-in `/health` and `/ready` endpoints
- **Graceful Shutdown**: Signal handling for clean application restarts
- **Production Ready**: Scalable architecture with automated deployment
## πŸš€ Quick Start
### 🌐 Online (Recommended)
**Instant access** - No installation required:
πŸ‘‰ [**Launch TextLens**](https://huggingface.co/spaces/GoConqurer/textlens-ocr)
### πŸ’» Local Development
1. **Clone Repository**
```bash
git clone https://github.com/KumarAmrit30/textlens-ocr.git
cd textlens-ocr
```
2. **Setup Environment**
```bash
python -m venv textlens_env
source textlens_env/bin/activate # Windows: textlens_env\Scripts\activate
pip install -r requirements.txt
```
3. **Launch Application**
```bash
python app.py
```
🌐 Open: `http://localhost:7860`
### πŸ§ͺ Quick Test
```bash
# Verify installation
python -c "from models.ocr_processor import OCRProcessor; print('βœ… TextLens ready!')"
```
## πŸ“Š Model Performance
| Model | Size | Speed | Accuracy | Best For |
| -------------------- | ----- | --------- | ------------ | ---------------------- |
| **Florence-2-base** | 270M | ⚑ Fast | πŸ“ˆ High | General OCR, Real-time |
| **Florence-2-large** | 770M | 🐌 Medium | πŸ“Š Very High | High accuracy needs |
| **EasyOCR** | ~100M | πŸš€ Medium | πŸ“‹ Good | Fallback, Multilingual |
## 🎯 Supported Use Cases
| Category | Examples | Performance |
| ------------------- | ------------------------------- | ----------- |
| πŸ“„ **Documents** | PDFs, Scanned papers, Forms | ⭐⭐⭐⭐⭐ |
| 🧾 **Receipts** | Shopping receipts, Invoices | ⭐⭐⭐⭐ |
| πŸ“± **Screenshots** | App interfaces, Error messages | ⭐⭐⭐⭐⭐ |
| πŸš— **Vehicle** | License plates, VIN numbers | ⭐⭐⭐⭐ |
| πŸ“š **Books** | Printed text, Handwritten notes | ⭐⭐⭐⭐ |
| 🌐 **Multilingual** | Multiple languages | ⭐⭐⭐ |
## πŸ”§ Configuration
### πŸŽ›οΈ Model Selection
```python
from models.ocr_processor import OCRProcessor
# Fast inference (recommended)
ocr = OCRProcessor(model_name="microsoft/Florence-2-base")
# Maximum accuracy
ocr = OCRProcessor(model_name="microsoft/Florence-2-large")
```
### 🎨 UI Customization
Modify `ui/styles.py` to customize appearance:
```python
# Change color scheme
PRIMARY_COLOR = "#1f77b4"
SECONDARY_COLOR = "#ff7f0e"
# Update layout
INTERFACE_WIDTH = "100%"
```
### βš™οΈ Environment Variables
| Variable | Description | Default |
| ---------------------- | -------------------- | ---------------------- |
| `SPACE_ID` | HuggingFace Space ID | Auto-detected |
| `DEPLOYMENT_STAGE` | deployment stage | `production` |
| `TRANSFORMERS_CACHE` | Model cache path | `~/.cache/huggingface` |
| `CUDA_VISIBLE_DEVICES` | GPU selection | All available |
**Deployment Flow:**
```mermaid
graph LR
A[Code Push] --> B[Validate]
B --> C[Deploy Staging]
C --> D[Health Check]
D --> E[Deploy Production]
E --> F[Verify]
F --> G[Complete βœ…]
```
## 🀝 Contributing
We welcome contributions! Here's how to get started:
### πŸ”§ Development Setup
1. **Fork & Clone**
```bash
git clone https://github.com/YOUR_USERNAME/textlens-ocr.git
cd textlens-ocr
```
2. **Create Branch**
```bash
git checkout -b feature/your-feature-name
```
3. **Make Changes**
- Add new features or fix bugs
- Update tests and documentation
- Follow code style guidelines
4. **Test Changes**
```bash
python -m pytest tests/
python -c "from models.ocr_processor import OCRProcessor; OCRProcessor()"
```
5. **Submit PR**
```bash
git add .
git commit -m "feat: add your feature description"
git push origin feature/your-feature-name
```
### πŸ“ Contribution Guidelines
- **Code Style**: Follow PEP 8, use Black formatter
- **Documentation**: Update README and docstrings
- **Tests**: Add tests for new functionality
- **Commits**: Use conventional commit messages
- **Issues**: Link PRs to relevant issues
## πŸ“„ License
This project is licensed under the **MIT License** - see the [LICENSE](LICENSE) file for details.
### πŸ™ Third-Party Licenses
- **Microsoft Florence-2**: [MIT License](https://github.com/microsoft/Florence)
- **HuggingFace Transformers**: [Apache License 2.0](https://github.com/huggingface/transformers)
- **Gradio**: [Apache License 2.0](https://github.com/gradio-app/gradio)
- **EasyOCR**: [Apache License 2.0](https://github.com/JaidedAI/EasyOCR)
## 🌟 Acknowledgments
Special thanks to:
- **Microsoft Research** for the incredible Florence-2 vision-language model
- **HuggingFace** for the transformers library and Spaces platform
- **Gradio Team** for the amazing web interface framework
- **JaidedAI** for EasyOCR fallback capabilities
- **Open Source Community** for continuous support and contributions
## πŸ“ˆ Project Status
| Component | Status | Version |
| ----------------- | ------------- | ------- |
| **Core OCR** | βœ… Stable | v1.0.0 |
| **Web UI** | βœ… Stable | v1.0.0 |
| **Deployment** | βœ… Production | v1.0.0 |
| **API** | βœ… Stable | v1.0.0 |
| **Documentation** | βœ… Complete | v1.0.0 |
### πŸ“Š Stats
![GitHub stars](https://img.shields.io/github/stars/KumarAmrit30/textlens-ocr?style=social)
![GitHub forks](https://img.shields.io/github/forks/KumarAmrit30/textlens-ocr?style=social)
![GitHub watchers](https://img.shields.io/github/watchers/KumarAmrit30/textlens-ocr?style=social)
---
<div align="center">
**Made with ❀️ for the AI community**
[⭐ Star this repo](https://github.com/KumarAmrit30/textlens-ocr) β€’ [πŸ”— Try the demo](https://huggingface.co/spaces/GoConqurer/textlens-ocr) β€’ [πŸ“– Read docs](DEPLOYMENT.md)
</div>