Spaces:
Running
Running
# π Utils Directory Guide - Format_Resume.py Focus | |
## π― **REQUIRED FILES for Format_Resume.py** (10 out of 11 files) | |
After analyzing the Format_Resume.py functionality with OpenAI GPT-4o as primary and HF Cloud as backup, here are the **essential files**: | |
``` | |
utils/ | |
βββ π― CORE EXTRACTION SYSTEM (Format_Resume.py dependencies) | |
β βββ hybrid_extractor.py # β REQUIRED - Main orchestrator (direct import) | |
β βββ openai_extractor.py # β REQUIRED - OpenAI GPT-4o (PRIMARY method) | |
β βββ hf_cloud_extractor.py # β REQUIRED - HF Cloud API (BACKUP method) | |
β βββ ai_extractor.py # β REQUIRED - Alternative HF AI (fallback) | |
β βββ hf_extractor_simple.py # β REQUIRED - Simple HF (fallback) | |
β βββ extractor_fixed.py # β REQUIRED - Regex fallback (last resort) | |
β | |
βββ ποΈ DOCUMENT PROCESSING (Format_Resume.py dependencies) | |
β βββ builder.py # β REQUIRED - Resume document generation with header/footer preservation | |
β βββ parser.py # β REQUIRED - PDF/DOCX text extraction (direct import) | |
β | |
βββ π REFERENCE DATA (Required for fallback system) | |
βββ data/ # β REQUIRED - Used by extractor_fixed.py fallback | |
βββ job_titles.json # β REQUIRED - Job title patterns for regex extraction | |
βββ skills.json # β REQUIRED - Skills matching for spaCy extraction | |
``` | |
## π **Dependency Chain for Format_Resume.py** | |
``` | |
pages/Format_Resume.py | |
βββ utils/hybrid_extractor.py (DIRECT IMPORT - orchestrator) | |
β βββ utils/openai_extractor.py (PRIMARY GPT-4o - best accuracy) | |
β βββ utils/hf_cloud_extractor.py (BACKUP - good accuracy) | |
β βββ utils/ai_extractor.py (alternative backup) | |
β βββ utils/hf_extractor_simple.py (simple backup) | |
β βββ utils/extractor_fixed.py (regex fallback) β uses data/job_titles.json & data/skills.json | |
βββ utils/builder.py (DIRECT IMPORT - document generation with template preservation) | |
βββ utils/parser.py (DIRECT IMPORT - file parsing) | |
``` | |
## π― **File Purposes for Format_Resume.py** | |
### **β REQUIRED - Core Extraction System** | |
| File | Purpose | When Used | Priority | | |
|------|---------|-----------|----------| | |
| `hybrid_extractor.py` | **Main entry point** - orchestrates all extraction methods | Always (Format_Resume.py imports this) | π΄ CRITICAL | | |
| `openai_extractor.py` | **PRIMARY AI** - OpenAI GPT-4o extraction with contact info | When `use_openai=True` (best results) | π PRIMARY | | |
| `hf_cloud_extractor.py` | **BACKUP AI** - Hugging Face Cloud API extraction | When OpenAI fails or unavailable | π‘ BACKUP | | |
| `ai_extractor.py` | **Alternative AI** - HF AI models extraction | Alternative backup method | π’ FALLBACK | | |
| `hf_extractor_simple.py` | **Simple AI** - Simplified local processing | When cloud APIs fail | π’ FALLBACK | | |
| `extractor_fixed.py` | **Reliable fallback** - Regex-based extraction with spaCy | When all AI methods fail | π΅ LAST RESORT | | |
### **β REQUIRED - Document Processing** | |
| File | Purpose | When Used | Priority | | |
|------|---------|-----------|----------| | |
| `builder.py` | **Document generation** - Creates formatted Word docs with preserved headers/footers | Always (Format_Resume.py imports this) | π΄ CRITICAL | | |
| `parser.py` | **File parsing** - Extracts raw text from PDF/DOCX files | Always (Format_Resume.py imports this) | π΄ CRITICAL | | |
### **β REQUIRED - Reference Data** | |
| File | Purpose | When Used | Priority | | |
|------|---------|-----------|----------| | |
| `data/job_titles.json` | **Job title patterns** - Used by extractor_fixed.py for regex matching | When all AI methods fail (fallback) | π‘ BACKUP | | |
| `data/skills.json` | **Skills database** - Used by extractor_fixed.py for spaCy skill matching | When all AI methods fail (fallback) | π‘ BACKUP | | |
## π **Format_Resume.py Extraction Flow** | |
``` | |
1. User uploads resume β parser.py extracts raw text | |
2. hybrid_extractor.py orchestrates extraction: | |
βββ Try openai_extractor.py (PRIMARY GPT-4o - best accuracy) | |
βββ If fails β Try hf_cloud_extractor.py (BACKUP - good accuracy) | |
βββ If fails β Try ai_extractor.py (alternative backup) | |
βββ If fails β Try hf_extractor_simple.py (simple backup) | |
βββ If all fail β Use extractor_fixed.py (regex fallback) β uses data/*.json | |
3. builder.py generates formatted Word document with preserved template headers/footers | |
4. User downloads formatted resume with Qvell branding and proper formatting | |