Spaces:
Running
A newer version of the Gradio SDK is available:
5.37.0
CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Project Overview
This is a production-ready GAIA benchmark AI agent achieving 85% accuracy through a sophisticated multi-agent architecture. The system has been fully refactored into a modular, maintainable architecture that specializes in complex question answering across multimedia, research, file processing, chess analysis, and mathematical reasoning domains.
Development Commands
Setup and Installation
# Install dependencies
pip install -r requirements.txt
# Test API key configuration
python test_api_keys.py
# Verify core functionality
python -c "from main import GAIASolver; print('β
Core GAIASolver available')"
Running the System
# Run legacy monolithic solver
python main.py
# Run refactored modular solver (recommended)
python main_refactored.py
# Run Gradio web interface
python app.py
Testing Commands
# Comprehensive async testing
python async_complete_test.py
# Test question classification
python test_improved_classification.py
python final_classification_test.py
# Test YouTube functionality
python direct_youtube_test.py
python simple_youtube_test.py
python test_youtube_question.py
# Test individual components
python -c "from gaia_tools import GAIA_TOOLS; print(f'Available tools: {len(GAIA_TOOLS)}')"
python -c "from question_classifier import QuestionClassifier; c = QuestionClassifier(); print('β
Classifier ready')"
Architecture Overview
Dual Architecture Design
This project maintains both legacy monolithic and refactored modular architectures:
Legacy Architecture (main.py):
- Monolithic 1285-line solver with all functionality integrated
- Comprehensive tool collection in gaia_tools.py (4887 lines)
- Single-file approach for rapid development and deployment
Refactored Architecture (gaia/ package):
gaia/
βββ core/ # Main solver logic
β βββ solver.py # GAIASolver main class
β βββ answer_extractor.py # Specialized answer extraction classes
β βββ question_processor.py # Question classification and processing
βββ tools/ # Tool implementations
β βββ base.py # Abstract tool interface and registry
β βββ registry.py # Tool discovery and management
β βββ [specialized tool modules]
βββ models/ # Model providers and management
β βββ manager.py # ModelManager with fallback chains
β βββ providers.py # LiteLLM, Gemini, Kluster providers
βββ config/ # Configuration management
β βββ settings.py # Config, ModelConfig classes
βββ utils/ # Utilities and helpers
βββ exceptions.py # Custom exception hierarchy
βββ logging.py # Logging configuration
Core Components
GAIASolver (main.py): Legacy monolithic solver with 1000+ lines of sophisticated processing logic GAIASolver (gaia/core/solver.py): Refactored main orchestrator using dependency injection QuestionClassifier: LLM-based intelligent routing with pattern-based fallbacks GAIA_TOOLS: 42 specialized tools including enhanced Wikipedia research, chess analysis, Excel processing, and multimedia analysis ModelManager: Handles model initialization, fallback chains (Kluster.ai β Gemini β Qwen), and lifecycle management
Question Type Specialization
Research Questions (92% accuracy):
- Enhanced Wikipedia tools with date-specific searches and Featured Articles integration
- Multi-step research coordination with cross-validation
- Anti-hallucination safeguards to prevent fabrication
Chess Questions (100% accuracy):
- Universal FEN correction system handling any vision error pattern
- Multi-tool consensus system for maximum accuracy
- Perfect algebraic notation extraction
YouTube/Multimedia Questions:
- Enhanced URL detection with multiple regex patterns
- Forced classification override for YouTube content
- Specialized prompts with explicit tool usage instructions
File Processing (100% accuracy):
- Format-specific tools for Excel (.xlsx/.xls), Python (.py), text files
- Deterministic Python execution with sandboxed environment
- Financial calculation specialization with proper currency formatting
Environment Configuration
Required API Keys (set in .env)
GEMINI_API_KEY
- Primary model (Gemini Flash 2.0)HUGGINGFACE_TOKEN
- Fallback model and classificationKLUSTER_API_KEY
- Optional premium model access
Model Fallback Chain
- Kluster.ai (Qwen3-235B, Gemma3-27B) - Premium option
- Gemini Flash 2.0 - Primary production model
- Qwen 2.5-72B - Reliable fallback via HuggingFace
Key Design Patterns
Anti-Hallucination Architecture
- Tool result prioritization: Always uses exact tool outputs over internal reasoning
- Cross-validation: Multiple verification methods for critical information
- Source attribution: Clear tracking and validation of information sources
- Validation rules: Type-specific answer extraction and verification
Performance Optimizations
- Fresh agent creation for each question to avoid token accumulation
- Concurrent processing support with async operations
- 15-minute web cache for improved response times
- Exponential backoff for API rate limiting
File Organization
Core Files
main.py
- Legacy monolithic solver (1285 lines)main_refactored.py
- Entry point for refactored architecturegaia_tools.py
- 42 specialized tools with robust error handling (4887 lines)question_classifier.py
- LLM + pattern-based classification systemapp.py
- Production Gradio interface with comprehensive error handling
Supporting Files
async_complete_test.py
- Comprehensive async testing infrastructureenhanced_wikipedia_tools.py
- Advanced Wikipedia research capabilitiesuniversal_fen_correction.py
- Chess-specific FEN notation correctionwikipedia_featured_articles_by_date.py
- Date-specific Wikipedia searches
Local Configuration Notes
- huggingface token can get from secrets in .env