Upload 3 files
Browse files- README.md +29 -25
- requirements.txt +1 -1
README.md
CHANGED
@@ -6,37 +6,41 @@ emoji: 📚
|
|
6 |
colorFrom: blue
|
7 |
colorTo: purple
|
8 |
thumbnail: >-
|
9 |
-
https://cdn-uploads.huggingface.co/production/uploads/6495d5a915d8ef6f01bc75eb/
|
10 |
-
short_description: GPU-Accelerated
|
11 |
---
|
12 |
-
# ZeroGPU Multilingual PDF Text Extractor
|
13 |
|
14 |
-
|
15 |
|
16 |
-
|
17 |
-
|
18 |
-
| On‑demand GPU | `@spaces.GPU` wraps only the OCR phase – 0 credits burnt when you choose **native** mode. |
|
19 |
-
| Streaming output | Results appear page‑by‑page; no more guessing “is it stuck?”. |
|
20 |
-
| Progress bar | Slick Gradio 4 `Progress` widget with pages processed / total. |
|
21 |
-
| Language picker | Loads exactly the EasyOCR models you need for sharper accuracy & faster warm‑up. |
|
22 |
-
| Modes | **native** (embedded text only), **ocr** (images only), **auto** (mixed). |
|
23 |
-
| Download button | Get a `.txt` file of the final output. |
|
24 |
-
| UX polish | Two‑column responsive layout, soft purple theme, sample PDFs for instant demo. |
|
25 |
-
| Robustness | File‑size guard (200 MB), CUDA OOM retry at lower DPI, unsupported language error message. |
|
26 |
|
27 |
-
##
|
28 |
|
29 |
-
|
30 |
-
|
31 |
-
|
32 |
-
|
|
|
|
|
|
|
|
|
33 |
|
34 |
-
## Deploy
|
35 |
|
36 |
-
1. Create a
|
37 |
-
2. Upload
|
38 |
-
3. Commit – first OCR call
|
39 |
|
40 |
-
##
|
41 |
|
42 |
-
*
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
6 |
colorFrom: blue
|
7 |
colorTo: purple
|
8 |
thumbnail: >-
|
9 |
+
https://cdn-uploads.huggingface.co/production/uploads/6495d5a915d8ef6f01bc75eb/TSmoqoWGoatq_GLsau_La.png
|
10 |
+
short_description: GPU-Accelerated OCR
|
11 |
---
|
|
|
12 |
|
13 |
+
# Easy‑OCR · ZeroGPU Multilingual PDF Text Extractor
|
14 |
|
15 |
+
**Why this Space?**
|
16 |
+
All the power of GPU‑accelerated OCR, yet you **only pay for GPU seconds you actually use** – thanks to the HuggingFace **ZeroGPU** backend.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
17 |
|
18 |
+
## 🔑 Key features
|
19 |
|
20 |
+
| Category | Details |
|
21 |
+
|----------|---------|
|
22 |
+
| 💡 On‑demand GPU | `@spaces.GPU` wraps only the OCR phase – choose **native** mode and the app never even touches a GPU. |
|
23 |
+
| 📝 Hybrid extraction | First pulls native PDF text with **pdfplumber**, then OCRs any remaining images with **EasyOCR**. |
|
24 |
+
| 🌍 Multilingual | Pick one or more language codes in the dropdown – the app loads only those EasyOCR models for sharper accuracy and faster warm‑up. |
|
25 |
+
| ⚡ Streaming UX | Text appears page‑by‑page with a live progress bar. |
|
26 |
+
| 📥 Download | One‑click `.txt` export of the full extraction. |
|
27 |
+
| 🛡️ Robust | File‑size guard, CUDA OOM fallback, unsupported‑language warnings. |
|
28 |
|
29 |
+
## 🚀 Deploy your own
|
30 |
|
31 |
+
1. **Create** a *Gradio* Space on HuggingFace and select **ZeroGPU** in the *Hardware* dropdown (requires PRO or Inference Credits).
|
32 |
+
2. Upload the three project files (`app.py`, `requirements.txt`, `README.md`) *or* just drop in the ZIP bundle.
|
33 |
+
3. **Commit** – the Space builds automatically. The first OCR call downloads EasyOCR model weights (~200 MB per language group).
|
34 |
|
35 |
+
## 💡 Usage tips
|
36 |
|
37 |
+
* Large PDFs can take several minutes; the GPU reservation duration is `600 s`. Tweak the `@spaces.GPU(duration=…)` decorator if needed.
|
38 |
+
* For faster queues, lower the duration or split very large documents.
|
39 |
+
* When you know your PDF is **text only**, selecting **native** mode skips GPU altogether for near‑instant results.
|
40 |
+
|
41 |
+
## 🏗️ Contributing
|
42 |
+
|
43 |
+
* Code style: `black app.py` and `ruff app.py`.
|
44 |
+
* Test: run `pytest tests/` (sample fixtures provided).
|
45 |
+
|
46 |
+
Happy extracting! 📚
|
requirements.txt
CHANGED
@@ -1,4 +1,4 @@
|
|
1 |
-
gradio>=
|
2 |
easyocr>=1.7.1
|
3 |
torch>=2.0
|
4 |
pdfplumber>=0.10.3
|
|
|
1 |
+
gradio>=4.1
|
2 |
easyocr>=1.7.1
|
3 |
torch>=2.0
|
4 |
pdfplumber>=0.10.3
|