CultriX commited on
Commit
1cd2200
·
verified ·
1 Parent(s): 6785b0f

Upload 3 files

Browse files
Files changed (2) hide show
  1. README.md +29 -25
  2. requirements.txt +1 -1
README.md CHANGED
@@ -6,37 +6,41 @@ emoji: 📚
6
  colorFrom: blue
7
  colorTo: purple
8
  thumbnail: >-
9
- https://cdn-uploads.huggingface.co/production/uploads/6495d5a915d8ef6f01bc75eb/SIMWE36Au--DpkzLgq9MI.png
10
- short_description: GPU-Accelerated Multi-Lingual OCR
11
  ---
12
- # ZeroGPU Multilingual PDF Text Extractor
13
 
14
- This Space marries **speed**, **accuracy**, and a polished **UX**:
15
 
16
- | Capability | How |
17
- |------------|-----|
18
- | On‑demand GPU | `@spaces.GPU` wraps only the OCR phase – 0 credits burnt when you choose **native** mode. |
19
- | Streaming output | Results appear page‑by‑page; no more guessing “is it stuck?”. |
20
- | Progress bar | Slick Gradio 4 `Progress` widget with pages processed / total. |
21
- | Language picker | Loads exactly the EasyOCR models you need for sharper accuracy & faster warm‑up. |
22
- | Modes | **native** (embedded text only), **ocr** (images only), **auto** (mixed). |
23
- | Download button | Get a `.txt` file of the final output. |
24
- | UX polish | Two‑column responsive layout, soft purple theme, sample PDFs for instant demo. |
25
- | Robustness | File‑size guard (200 MB), CUDA OOM retry at lower DPI, unsupported language error message. |
26
 
27
- ## Running locally
28
 
29
- ```bash
30
- pip install -r requirements.txt
31
- python app.py
32
- ```
 
 
 
 
33
 
34
- ## Deploy on HuggingFace
35
 
36
- 1. Create a **Gradio** Space and pick **ZeroGPU** hardware.
37
- 2. Upload these files or the ZIP bundle.
38
- 3. Commit – first OCR call will download model weights (~200 MB each language family).
39
 
40
- ## Maintainers
41
 
42
- *Run `black app.py && ruff app.py` before committing to stay stylish.*
 
 
 
 
 
 
 
 
 
 
6
  colorFrom: blue
7
  colorTo: purple
8
  thumbnail: >-
9
+ https://cdn-uploads.huggingface.co/production/uploads/6495d5a915d8ef6f01bc75eb/TSmoqoWGoatq_GLsau_La.png
10
+ short_description: GPU-Accelerated OCR
11
  ---
 
12
 
13
+ # Easy‑OCR · ZeroGPU Multilingual PDF Text Extractor
14
 
15
+ **Why this Space?**
16
+ All the power of GPU‑accelerated OCR, yet you **only pay for GPU seconds you actually use** – thanks to the HuggingFace **ZeroGPU** backend.
 
 
 
 
 
 
 
 
17
 
18
+ ## 🔑 Key features
19
 
20
+ | Category | Details |
21
+ |----------|---------|
22
+ | 💡 On‑demand GPU | `@spaces.GPU` wraps only the OCR phase – choose **native** mode and the app never even touches a GPU. |
23
+ | 📝 Hybrid extraction | First pulls native PDF text with **pdfplumber**, then OCRs any remaining images with **EasyOCR**. |
24
+ | 🌍 Multilingual | Pick one or more language codes in the dropdown – the app loads only those EasyOCR models for sharper accuracy and faster warm‑up. |
25
+ | ⚡ Streaming UX | Text appears page‑by‑page with a live progress bar. |
26
+ | 📥 Download | One‑click `.txt` export of the full extraction. |
27
+ | 🛡️ Robust | File‑size guard, CUDA OOM fallback, unsupported‑language warnings. |
28
 
29
+ ## 🚀 Deploy your own
30
 
31
+ 1. **Create** a *Gradio* Space on HuggingFace and select **ZeroGPU** in the *Hardware* dropdown (requires PRO or Inference Credits).
32
+ 2. Upload the three project files (`app.py`, `requirements.txt`, `README.md`) *or* just drop in the ZIP bundle.
33
+ 3. **Commit**the Space builds automatically. The first OCR call downloads EasyOCR model weights (~200 MB per language group).
34
 
35
+ ## 💡 Usage tips
36
 
37
+ * Large PDFs can take several minutes; the GPU reservation duration is `600 s`. Tweak the `@spaces.GPU(duration=…)` decorator if needed.
38
+ * For faster queues, lower the duration or split very large documents.
39
+ * When you know your PDF is **text only**, selecting **native** mode skips GPU altogether for near‑instant results.
40
+
41
+ ## 🏗️ Contributing
42
+
43
+ * Code style: `black app.py` and `ruff app.py`.
44
+ * Test: run `pytest tests/` (sample fixtures provided).
45
+
46
+ Happy extracting! 📚
requirements.txt CHANGED
@@ -1,4 +1,4 @@
1
- gradio>=5.34.1
2
  easyocr>=1.7.1
3
  torch>=2.0
4
  pdfplumber>=0.10.3
 
1
+ gradio>=4.1
2
  easyocr>=1.7.1
3
  torch>=2.0
4
  pdfplumber>=0.10.3