File size: 4,673 Bytes
c2f9ec8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
# πŸ“ Utils Directory Guide - Format_Resume.py Focus

## 🎯 **REQUIRED FILES for Format_Resume.py** (10 out of 11 files)

After analyzing the Format_Resume.py functionality with OpenAI GPT-4o as primary and HF Cloud as backup, here are the **essential files**:

```
utils/
β”œβ”€β”€ 🎯 CORE EXTRACTION SYSTEM (Format_Resume.py dependencies)
β”‚   β”œβ”€β”€ hybrid_extractor.py      # ⭐ REQUIRED - Main orchestrator (direct import)
β”‚   β”œβ”€β”€ openai_extractor.py      # ⭐ REQUIRED - OpenAI GPT-4o (PRIMARY method)
β”‚   β”œβ”€β”€ hf_cloud_extractor.py    # ⭐ REQUIRED - HF Cloud API (BACKUP method)
β”‚   β”œβ”€β”€ ai_extractor.py          # ⭐ REQUIRED - Alternative HF AI (fallback)
β”‚   β”œβ”€β”€ hf_extractor_simple.py   # ⭐ REQUIRED - Simple HF (fallback)
β”‚   └── extractor_fixed.py       # ⭐ REQUIRED - Regex fallback (last resort)
β”‚
β”œβ”€β”€ πŸ—οΈ DOCUMENT PROCESSING (Format_Resume.py dependencies)
β”‚   β”œβ”€β”€ builder.py               # ⭐ REQUIRED - Resume document generation with header/footer preservation
β”‚   └── parser.py                # ⭐ REQUIRED - PDF/DOCX text extraction (direct import)
β”‚
└── πŸ“Š REFERENCE DATA (Required for fallback system)
    └── data/                    # ⭐ REQUIRED - Used by extractor_fixed.py fallback
        β”œβ”€β”€ job_titles.json      # ⭐ REQUIRED - Job title patterns for regex extraction
        └── skills.json          # ⭐ REQUIRED - Skills matching for spaCy extraction
```

## πŸ”— **Dependency Chain for Format_Resume.py**

```
pages/Format_Resume.py
β”œβ”€β”€ utils/hybrid_extractor.py (DIRECT IMPORT - orchestrator)
β”‚   β”œβ”€β”€ utils/openai_extractor.py (PRIMARY GPT-4o - best accuracy)
β”‚   β”œβ”€β”€ utils/hf_cloud_extractor.py (BACKUP - good accuracy)
β”‚   β”œβ”€β”€ utils/ai_extractor.py (alternative backup)
β”‚   β”œβ”€β”€ utils/hf_extractor_simple.py (simple backup)
β”‚   └── utils/extractor_fixed.py (regex fallback) β†’ uses data/job_titles.json & data/skills.json
β”œβ”€β”€ utils/builder.py (DIRECT IMPORT - document generation with template preservation)
└── utils/parser.py (DIRECT IMPORT - file parsing)
```

## 🎯 **File Purposes for Format_Resume.py**

### **βœ… REQUIRED - Core Extraction System**

| File | Purpose | When Used | Priority |
|------|---------|-----------|----------|
| `hybrid_extractor.py` | **Main entry point** - orchestrates all extraction methods | Always (Format_Resume.py imports this) | πŸ”΄ CRITICAL |
| `openai_extractor.py` | **PRIMARY AI** - OpenAI GPT-4o extraction with contact info | When `use_openai=True` (best results) | 🟠 PRIMARY |
| `hf_cloud_extractor.py` | **BACKUP AI** - Hugging Face Cloud API extraction | When OpenAI fails or unavailable | 🟑 BACKUP |
| `ai_extractor.py` | **Alternative AI** - HF AI models extraction | Alternative backup method | 🟒 FALLBACK |
| `hf_extractor_simple.py` | **Simple AI** - Simplified local processing | When cloud APIs fail | 🟒 FALLBACK |
| `extractor_fixed.py` | **Reliable fallback** - Regex-based extraction with spaCy | When all AI methods fail | πŸ”΅ LAST RESORT |

### **βœ… REQUIRED - Document Processing**

| File | Purpose | When Used | Priority |
|------|---------|-----------|----------|
| `builder.py` | **Document generation** - Creates formatted Word docs with preserved headers/footers | Always (Format_Resume.py imports this) | πŸ”΄ CRITICAL |
| `parser.py` | **File parsing** - Extracts raw text from PDF/DOCX files | Always (Format_Resume.py imports this) | πŸ”΄ CRITICAL |

### **βœ… REQUIRED - Reference Data**

| File | Purpose | When Used | Priority |
|------|---------|-----------|----------|
| `data/job_titles.json` | **Job title patterns** - Used by extractor_fixed.py for regex matching | When all AI methods fail (fallback) | 🟑 BACKUP |
| `data/skills.json` | **Skills database** - Used by extractor_fixed.py for spaCy skill matching | When all AI methods fail (fallback) | 🟑 BACKUP |


## πŸš€ **Format_Resume.py Extraction Flow**

```
1. User uploads resume β†’ parser.py extracts raw text
2. hybrid_extractor.py orchestrates extraction:
   β”œβ”€β”€ Try openai_extractor.py (PRIMARY GPT-4o - best accuracy)
   β”œβ”€β”€ If fails β†’ Try hf_cloud_extractor.py (BACKUP - good accuracy)
   β”œβ”€β”€ If fails β†’ Try ai_extractor.py (alternative backup)
   β”œβ”€β”€ If fails β†’ Try hf_extractor_simple.py (simple backup)
   └── If all fail β†’ Use extractor_fixed.py (regex fallback) β†’ uses data/*.json
3. builder.py generates formatted Word document with preserved template headers/footers
4. User downloads formatted resume with Qvell branding and proper formatting