Spaces:

CultriX
/

Easy-OCR

Running on Zero

App Files Files Community

CultriX commited on Jun 18

Commit

1cd2200

verified ·

1 Parent(s): 6785b0f

Upload 3 files

Browse files

Files changed (2) hide show

README.md +29 -25
requirements.txt +1 -1

README.md CHANGED Viewed

@@ -6,37 +6,41 @@ emoji: 📚
 colorFrom: blue
 colorTo: purple
 thumbnail: >-
-  https://cdn-uploads.huggingface.co/production/uploads/6495d5a915d8ef6f01bc75eb/SIMWE36Au--DpkzLgq9MI.png
-short_description: GPU-Accelerated Multi-Lingual OCR
 ---
-# ZeroGPU Multilingual PDF Text Extractor
-This Space marries **speed**, **accuracy**, and a polished **UX**:
-| Capability | How |
-|------------|-----|
-| On‑demand GPU | `@spaces.GPU` wraps only the OCR phase – 0 credits burnt when you choose **native** mode. |
-| Streaming output | Results appear page‑by‑page; no more guessing “is it stuck?”. |
-| Progress bar | Slick Gradio 4 `Progress` widget with pages processed / total. |
-| Language picker | Loads exactly the EasyOCR models you need for sharper accuracy & faster warm‑up. |
-| Modes | **native** (embedded text only), **ocr** (images only), **auto** (mixed). |
-| Download button | Get a `.txt` file of the final output. |
-| UX polish | Two‑column responsive layout, soft purple theme, sample PDFs for instant demo. |
-| Robustness | File‑size guard (200 MB), CUDA OOM retry at lower DPI, unsupported language error message. |
-## Running locally
-```bash
-pip install -r requirements.txt
-python app.py
-```
-## Deploy on HuggingFace
-1. Create a **Gradio** Space and pick **ZeroGPU** hardware.
-2. Upload these files or the ZIP bundle.
-3. Commit – first OCR call will download model weights (~200 MB each language family).
-## Maintainers
-*Run `black app.py && ruff app.py` before committing to stay stylish.*

 colorFrom: blue
 colorTo: purple
 thumbnail: >-
+  https://cdn-uploads.huggingface.co/production/uploads/6495d5a915d8ef6f01bc75eb/TSmoqoWGoatq_GLsau_La.png
+short_description: GPU-Accelerated OCR
 ---
+# Easy‑OCR · ZeroGPU Multilingual PDF Text Extractor
+**Why this Space?**
+All the power of GPU‑accelerated OCR, yet you **only pay for GPU seconds you actually use** – thanks to the HuggingFace **ZeroGPU** backend.
+## 🔑 Key features
+| Category | Details |
+|----------|---------|
+| 💡 On‑demand GPU | `@spaces.GPU` wraps only the OCR phase – choose **native** mode and the app never even touches a GPU. |
+| 📝 Hybrid extraction | First pulls native PDF text with **pdfplumber**, then OCRs any remaining images with **EasyOCR**. |
+| 🌍 Multilingual | Pick one or more language codes in the dropdown – the app loads only those EasyOCR models for sharper accuracy and faster warm‑up. |
+| ⚡ Streaming UX | Text appears page‑by‑page with a live progress bar. |
+| 📥 Download | One‑click `.txt` export of the full extraction. |
+| 🛡️ Robust | File‑size guard, CUDA OOM fallback, unsupported‑language warnings. |
+## 🚀 Deploy your own
+1. **Create** a *Gradio* Space on HuggingFace and select **ZeroGPU** in the *Hardware* dropdown (requires PRO or Inference Credits).
+2. Upload the three project files (`app.py`, `requirements.txt`, `README.md`) *or* just drop in the ZIP bundle.
+3. **Commit** – the Space builds automatically. The first OCR call downloads EasyOCR model weights (~200 MB per language group).
+## 💡 Usage tips
+* Large PDFs can take several minutes; the GPU reservation duration is `600 s`. Tweak the `@spaces.GPU(duration=…)` decorator if needed.
+* For faster queues, lower the duration or split very large documents.
+* When you know your PDF is **text only**, selecting **native** mode skips GPU altogether for near‑instant results.
+## 🏗️ Contributing
+* Code style: `black app.py` and `ruff app.py`.
+* Test: run `pytest tests/` (sample fixtures provided).
+Happy extracting! 📚

requirements.txt CHANGED Viewed

@@ -1,4 +1,4 @@
-gradio>=5.34.1
 easyocr>=1.7.1
 torch>=2.0
 pdfplumber>=0.10.3

+gradio>=4.1
 easyocr>=1.7.1
 torch>=2.0
 pdfplumber>=0.10.3