Spaces:
Running
Running
Johnny
feat: Complete Format_Resume.py system with OpenAI GPT-4o integration and template preservation - Added Format_Resume.py Streamlit page with OpenAI GPT-4o primary extraction, HF Cloud backup, 5-tier fallback system, template preservation with Qvell branding, contact info extraction, skills cleaning, career timeline generation, and comprehensive utils restructure (10/11 files required). Renamed app.py to TalentLens.py, added blank_resume.docx template, updated .gitignore for Salesforce exclusion.
c2f9ec8
π Utils Directory Guide - Format_Resume.py Focus
π― REQUIRED FILES for Format_Resume.py (10 out of 11 files)
After analyzing the Format_Resume.py functionality with OpenAI GPT-4o as primary and HF Cloud as backup, here are the essential files:
utils/
βββ π― CORE EXTRACTION SYSTEM (Format_Resume.py dependencies)
β βββ hybrid_extractor.py # β REQUIRED - Main orchestrator (direct import)
β βββ openai_extractor.py # β REQUIRED - OpenAI GPT-4o (PRIMARY method)
β βββ hf_cloud_extractor.py # β REQUIRED - HF Cloud API (BACKUP method)
β βββ ai_extractor.py # β REQUIRED - Alternative HF AI (fallback)
β βββ hf_extractor_simple.py # β REQUIRED - Simple HF (fallback)
β βββ extractor_fixed.py # β REQUIRED - Regex fallback (last resort)
β
βββ ποΈ DOCUMENT PROCESSING (Format_Resume.py dependencies)
β βββ builder.py # β REQUIRED - Resume document generation with header/footer preservation
β βββ parser.py # β REQUIRED - PDF/DOCX text extraction (direct import)
β
βββ π REFERENCE DATA (Required for fallback system)
βββ data/ # β REQUIRED - Used by extractor_fixed.py fallback
βββ job_titles.json # β REQUIRED - Job title patterns for regex extraction
βββ skills.json # β REQUIRED - Skills matching for spaCy extraction
π Dependency Chain for Format_Resume.py
pages/Format_Resume.py
βββ utils/hybrid_extractor.py (DIRECT IMPORT - orchestrator)
β βββ utils/openai_extractor.py (PRIMARY GPT-4o - best accuracy)
β βββ utils/hf_cloud_extractor.py (BACKUP - good accuracy)
β βββ utils/ai_extractor.py (alternative backup)
β βββ utils/hf_extractor_simple.py (simple backup)
β βββ utils/extractor_fixed.py (regex fallback) β uses data/job_titles.json & data/skills.json
βββ utils/builder.py (DIRECT IMPORT - document generation with template preservation)
βββ utils/parser.py (DIRECT IMPORT - file parsing)
π― File Purposes for Format_Resume.py
β REQUIRED - Core Extraction System
File | Purpose | When Used | Priority |
---|---|---|---|
hybrid_extractor.py |
Main entry point - orchestrates all extraction methods | Always (Format_Resume.py imports this) | π΄ CRITICAL |
openai_extractor.py |
PRIMARY AI - OpenAI GPT-4o extraction with contact info | When use_openai=True (best results) |
π PRIMARY |
hf_cloud_extractor.py |
BACKUP AI - Hugging Face Cloud API extraction | When OpenAI fails or unavailable | π‘ BACKUP |
ai_extractor.py |
Alternative AI - HF AI models extraction | Alternative backup method | π’ FALLBACK |
hf_extractor_simple.py |
Simple AI - Simplified local processing | When cloud APIs fail | π’ FALLBACK |
extractor_fixed.py |
Reliable fallback - Regex-based extraction with spaCy | When all AI methods fail | π΅ LAST RESORT |
β REQUIRED - Document Processing
File | Purpose | When Used | Priority |
---|---|---|---|
builder.py |
Document generation - Creates formatted Word docs with preserved headers/footers | Always (Format_Resume.py imports this) | π΄ CRITICAL |
parser.py |
File parsing - Extracts raw text from PDF/DOCX files | Always (Format_Resume.py imports this) | π΄ CRITICAL |
β REQUIRED - Reference Data
File | Purpose | When Used | Priority |
---|---|---|---|
data/job_titles.json |
Job title patterns - Used by extractor_fixed.py for regex matching | When all AI methods fail (fallback) | π‘ BACKUP |
data/skills.json |
Skills database - Used by extractor_fixed.py for spaCy skill matching | When all AI methods fail (fallback) | π‘ BACKUP |
β NOT NEEDED - Other Features
File | Purpose | Why Not Needed |
---|---|---|
screening.py |
Resume evaluation, scoring, candidate screening | Used by TalentLens.py, not Format_Resume.py |
π Format_Resume.py Extraction Flow
1. User uploads resume β parser.py extracts raw text
2. hybrid_extractor.py orchestrates extraction:
βββ Try openai_extractor.py (PRIMARY GPT-4o - best accuracy)
βββ If fails β Try hf_cloud_extractor.py (BACKUP - good accuracy)
βββ If fails β Try ai_extractor.py (alternative backup)
βββ If fails β Try hf_extractor_simple.py (simple backup)
βββ If all fail β Use extractor_fixed.py (regex fallback) β uses data/*.json
3. builder.py generates formatted Word document with preserved template headers/footers
4. User downloads formatted resume with Qvell branding and proper formatting
ποΈ Document Builder Enhancements
The builder.py
has been enhanced to properly handle template preservation:
Header/Footer Preservation
- β Preserves Qvell logo and branding in header
- β Maintains footer address (6001 Tain Dr. Suite 203, Dublin, OH, 43016)
- β Eliminates blank pages by clearing only body content
- β Preserves image references to prevent broken images
Content Generation Features
- β Professional Summary extraction and formatting
- β Skills table with 3-column layout
- β Professional Experience with job titles, companies, dates
- β Career Timeline chronological job history
- β Education and Training sections
- β Proper date formatting (e.g., "February 2017 β Present")
π File Usage Statistics
- Total utils files: 11
- Required for Format_Resume.py: 10 files (91%)
- Not needed for Format_Resume.py: 1 file (9%)
π§Ή Cleanup Recommendations
If you want to minimize the utils folder for Format_Resume.py only:
Keep These 10 Files:
utils/
βββ hybrid_extractor.py # Main orchestrator
βββ openai_extractor.py # OpenAI GPT-4o (primary)
βββ hf_cloud_extractor.py # HF Cloud (backup)
βββ ai_extractor.py # HF AI (fallback)
βββ hf_extractor_simple.py # Simple HF (fallback)
βββ extractor_fixed.py # Regex (last resort)
βββ builder.py # Document generation with template preservation
βββ parser.py # File parsing
βββ data/
βββ job_titles.json # Job title patterns for regex fallback
βββ skills.json # Skills database for spaCy fallback
Can Remove This 1 File (if only using Format_Resume.py):
utils/
βββ screening.py # Only used by TalentLens.py
π‘ Best Practices for Format_Resume.py
- Always use
hybrid_extractor.py
as your main entry point - Set environment variables for best results:
OPENAI_API_KEY
for OpenAI GPT-4o (primary)HF_API_TOKEN
for Hugging Face Cloud (backup)
- Use this configuration in Format_Resume.py:
data = extract_resume_sections( resume_text, prefer_ai=True, use_openai=True, # Try OpenAI GPT-4o first (best results) use_hf_cloud=True # Fallback to HF Cloud (good backup) )
- Template preservation is automatic - headers and footers are maintained
- Fallback system ensures extraction never completely fails
π§ Recent System Improvements
Header/Footer Preservation (Latest Fix)
- Problem: Template headers and footers were being lost during document generation
- Solution: Conservative content clearing that preserves document structure
- Result: Qvell branding and footer address now properly maintained
Extraction Quality Enhancements
- OpenAI GPT-4o Integration: Primary extraction method with structured prompts
- Contact Info Extraction: Automatic email, phone, LinkedIn detection
- Skills Cleaning: Improved filtering to remove company names and broken fragments
- Experience Structuring: Better job title, company, and date extraction
Fallback System Reliability
- JSON Dependencies: job_titles.json and skills.json required for regex fallback
- Quality Validation: Each extraction method is validated before acceptance
- Graceful Degradation: System never fails completely, always produces output
π§ͺ Testing Format_Resume.py Dependencies
# Test all required components for Format_Resume.py
from utils.hybrid_extractor import extract_resume_sections, HybridResumeExtractor
from utils.builder import build_resume_from_data
from utils.parser import parse_resume
# Test extraction with all fallbacks
sample_text = "John Doe\nSoftware Engineer\nPython, Java, React"
result = extract_resume_sections(sample_text, prefer_ai=True, use_openai=True, use_hf_cloud=True)
# Test document building with template preservation
template_path = "templates/blank_resume.docx"
doc = build_resume_from_data(template_path, result)
print("β
All Format_Resume.py dependencies working!")
print(f"β
Extraction method used: {result.get('extraction_method', 'unknown')}")
print(f"β
Headers/footers preserved: {len(doc.sections)} sections")
π― System Architecture Summary
The Format_Resume.py system now provides:
- Robust Extraction: 5-tier fallback system (OpenAI β HF Cloud β HF AI β HF Simple β Regex)
- Template Preservation: Headers, footers, and branding maintained perfectly
- Quality Assurance: Each extraction method validated for completeness
- Professional Output: Properly formatted Word documents with consistent styling
- Reliability: System never fails completely, always produces usable output
The utils directory analysis shows 10 out of 11 files are needed for Format_Resume.py functionality! π―
Recent improvements ensure perfect template preservation and reliable extraction quality. β¨