📁 Utils Directory Guide - Format_Resume.py Focus

🎯 REQUIRED FILES for Format_Resume.py (10 out of 11 files)

After analyzing the Format_Resume.py functionality with OpenAI GPT-4o as primary and HF Cloud as backup, here are the essential files:

utils/
├── 🎯 CORE EXTRACTION SYSTEM (Format_Resume.py dependencies)
│   ├── hybrid_extractor.py      # ⭐ REQUIRED - Main orchestrator (direct import)
│   ├── openai_extractor.py      # ⭐ REQUIRED - OpenAI GPT-4o (PRIMARY method)
│   ├── hf_cloud_extractor.py    # ⭐ REQUIRED - HF Cloud API (BACKUP method)
│   ├── ai_extractor.py          # ⭐ REQUIRED - Alternative HF AI (fallback)
│   ├── hf_extractor_simple.py   # ⭐ REQUIRED - Simple HF (fallback)
│   └── extractor_fixed.py       # ⭐ REQUIRED - Regex fallback (last resort)
│
├── 🏗️ DOCUMENT PROCESSING (Format_Resume.py dependencies)
│   ├── builder.py               # ⭐ REQUIRED - Resume document generation with header/footer preservation
│   └── parser.py                # ⭐ REQUIRED - PDF/DOCX text extraction (direct import)
│
└── 📊 REFERENCE DATA (Required for fallback system)
    └── data/                    # ⭐ REQUIRED - Used by extractor_fixed.py fallback
        ├── job_titles.json      # ⭐ REQUIRED - Job title patterns for regex extraction
        └── skills.json          # ⭐ REQUIRED - Skills matching for spaCy extraction

🔗 Dependency Chain for Format_Resume.py

pages/Format_Resume.py
├── utils/hybrid_extractor.py (DIRECT IMPORT - orchestrator)
│   ├── utils/openai_extractor.py (PRIMARY GPT-4o - best accuracy)
│   ├── utils/hf_cloud_extractor.py (BACKUP - good accuracy)
│   ├── utils/ai_extractor.py (alternative backup)
│   ├── utils/hf_extractor_simple.py (simple backup)
│   └── utils/extractor_fixed.py (regex fallback) → uses data/job_titles.json & data/skills.json
├── utils/builder.py (DIRECT IMPORT - document generation with template preservation)
└── utils/parser.py (DIRECT IMPORT - file parsing)

🎯 File Purposes for Format_Resume.py

✅ REQUIRED - Core Extraction System

File	Purpose	When Used	Priority
`hybrid_extractor.py`	Main entry point - orchestrates all extraction methods	Always (Format_Resume.py imports this)	🔴 CRITICAL
`openai_extractor.py`	PRIMARY AI - OpenAI GPT-4o extraction with contact info	When `use_openai=True` (best results)	🟠 PRIMARY
`hf_cloud_extractor.py`	BACKUP AI - Hugging Face Cloud API extraction	When OpenAI fails or unavailable	🟡 BACKUP
`ai_extractor.py`	Alternative AI - HF AI models extraction	Alternative backup method	🟢 FALLBACK
`hf_extractor_simple.py`	Simple AI - Simplified local processing	When cloud APIs fail	🟢 FALLBACK
`extractor_fixed.py`	Reliable fallback - Regex-based extraction with spaCy	When all AI methods fail	🔵 LAST RESORT

✅ REQUIRED - Document Processing

File	Purpose	When Used	Priority
`builder.py`	Document generation - Creates formatted Word docs with preserved headers/footers	Always (Format_Resume.py imports this)	🔴 CRITICAL
`parser.py`	File parsing - Extracts raw text from PDF/DOCX files	Always (Format_Resume.py imports this)	🔴 CRITICAL

✅ REQUIRED - Reference Data

File	Purpose	When Used	Priority
`data/job_titles.json`	Job title patterns - Used by extractor_fixed.py for regex matching	When all AI methods fail (fallback)	🟡 BACKUP
`data/skills.json`	Skills database - Used by extractor_fixed.py for spaCy skill matching	When all AI methods fail (fallback)	🟡 BACKUP

❌ NOT NEEDED - Other Features

File	Purpose	Why Not Needed
`screening.py`	Resume evaluation, scoring, candidate screening	Used by TalentLens.py, not Format_Resume.py

🚀 Format_Resume.py Extraction Flow

1. User uploads resume → parser.py extracts raw text
2. hybrid_extractor.py orchestrates extraction:
   ├── Try openai_extractor.py (PRIMARY GPT-4o - best accuracy)
   ├── If fails → Try hf_cloud_extractor.py (BACKUP - good accuracy)
   ├── If fails → Try ai_extractor.py (alternative backup)
   ├── If fails → Try hf_extractor_simple.py (simple backup)
   └── If all fail → Use extractor_fixed.py (regex fallback) → uses data/*.json
3. builder.py generates formatted Word document with preserved template headers/footers
4. User downloads formatted resume with Qvell branding and proper formatting

🏗️ Document Builder Enhancements

The builder.py has been enhanced to properly handle template preservation:

Header/Footer Preservation

✅ Preserves Qvell logo and branding in header
✅ Maintains footer address (6001 Tain Dr. Suite 203, Dublin, OH, 43016)
✅ Eliminates blank pages by clearing only body content
✅ Preserves image references to prevent broken images

Content Generation Features

✅ Professional Summary extraction and formatting
✅ Skills table with 3-column layout
✅ Professional Experience with job titles, companies, dates
✅ Career Timeline chronological job history
✅ Education and Training sections
✅ Proper date formatting (e.g., "February 2017 – Present")

📊 File Usage Statistics

Total utils files: 11
Required for Format_Resume.py: 10 files (91%)
Not needed for Format_Resume.py: 1 file (9%)

🧹 Cleanup Recommendations

If you want to minimize the utils folder for Format_Resume.py only:

Keep These 10 Files:

utils/
├── hybrid_extractor.py      # Main orchestrator
├── openai_extractor.py      # OpenAI GPT-4o (primary)
├── hf_cloud_extractor.py    # HF Cloud (backup)
├── ai_extractor.py          # HF AI (fallback)
├── hf_extractor_simple.py   # Simple HF (fallback)
├── extractor_fixed.py       # Regex (last resort)
├── builder.py               # Document generation with template preservation
├── parser.py                # File parsing
└── data/
    ├── job_titles.json      # Job title patterns for regex fallback
    └── skills.json          # Skills database for spaCy fallback

Can Remove This 1 File (if only using Format_Resume.py):

utils/
└── screening.py             # Only used by TalentLens.py

💡 Best Practices for Format_Resume.py

Always use hybrid_extractor.py as your main entry point
Set environment variables for best results:
- OPENAI_API_KEY for OpenAI GPT-4o (primary)
- HF_API_TOKEN for Hugging Face Cloud (backup)

Use this configuration in Format_Resume.py:

data = extract_resume_sections(
    resume_text, 
    prefer_ai=True, 
    use_openai=True,      # Try OpenAI GPT-4o first (best results)
    use_hf_cloud=True     # Fallback to HF Cloud (good backup)
)

Template preservation is automatic - headers and footers are maintained
Fallback system ensures extraction never completely fails

🔧 Recent System Improvements

Header/Footer Preservation (Latest Fix)

Problem: Template headers and footers were being lost during document generation
Solution: Conservative content clearing that preserves document structure
Result: Qvell branding and footer address now properly maintained

Extraction Quality Enhancements

OpenAI GPT-4o Integration: Primary extraction method with structured prompts
Contact Info Extraction: Automatic email, phone, LinkedIn detection
Skills Cleaning: Improved filtering to remove company names and broken fragments
Experience Structuring: Better job title, company, and date extraction

Fallback System Reliability

JSON Dependencies: job_titles.json and skills.json required for regex fallback
Quality Validation: Each extraction method is validated before acceptance
Graceful Degradation: System never fails completely, always produces output

🧪 Testing Format_Resume.py Dependencies

# Test all required components for Format_Resume.py
from utils.hybrid_extractor import extract_resume_sections, HybridResumeExtractor
from utils.builder import build_resume_from_data
from utils.parser import parse_resume

# Test extraction with all fallbacks
sample_text = "John Doe\nSoftware Engineer\nPython, Java, React"
result = extract_resume_sections(sample_text, prefer_ai=True, use_openai=True, use_hf_cloud=True)

# Test document building with template preservation
template_path = "templates/blank_resume.docx"
doc = build_resume_from_data(template_path, result)

print("✅ All Format_Resume.py dependencies working!")
print(f"✅ Extraction method used: {result.get('extraction_method', 'unknown')}")
print(f"✅ Headers/footers preserved: {len(doc.sections)} sections")

🎯 System Architecture Summary

The Format_Resume.py system now provides:

Robust Extraction: 5-tier fallback system (OpenAI → HF Cloud → HF AI → HF Simple → Regex)
Template Preservation: Headers, footers, and branding maintained perfectly
Quality Assurance: Each extraction method validated for completeness
Professional Output: Properly formatted Word documents with consistent styling
Reliability: System never fails completely, always produces usable output

The utils directory analysis shows 10 out of 11 files are needed for Format_Resume.py functionality! 🎯

Recent improvements ensure perfect template preservation and reliable extraction quality. ✨