# CLAUDE.md This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. ## Project Overview This is a **production-ready GAIA benchmark AI agent** achieving 85% accuracy through a sophisticated multi-agent architecture. The system has been **fully refactored** into a modular, maintainable architecture that specializes in complex question answering across multimedia, research, file processing, chess analysis, and mathematical reasoning domains. ## Development Commands ### Setup and Installation ```bash # Install dependencies pip install -r requirements.txt # Test API key configuration python test_api_keys.py # Verify core functionality python -c "from main import GAIASolver; print('✅ Core GAIASolver available')" ``` ### Running the System ```bash # Run legacy monolithic solver python main.py # Run refactored modular solver (recommended) python main_refactored.py # Run Gradio web interface python app.py ``` ### Testing Commands ```bash # Comprehensive async testing python async_complete_test.py # Test question classification python test_improved_classification.py python final_classification_test.py # Test YouTube functionality python direct_youtube_test.py python simple_youtube_test.py python test_youtube_question.py # Test individual components python -c "from gaia_tools import GAIA_TOOLS; print(f'Available tools: {len(GAIA_TOOLS)}')" python -c "from question_classifier import QuestionClassifier; c = QuestionClassifier(); print('✅ Classifier ready')" ``` ## Architecture Overview ### Dual Architecture Design This project maintains both **legacy monolithic** and **refactored modular** architectures: **Legacy Architecture (main.py):** - Monolithic 1285-line solver with all functionality integrated - Comprehensive tool collection in gaia_tools.py (4887 lines) - Single-file approach for rapid development and deployment **Refactored Architecture (gaia/ package):** ``` gaia/ ├── core/ # Main solver logic │ ├── solver.py # GAIASolver main class │ ├── answer_extractor.py # Specialized answer extraction classes │ └── question_processor.py # Question classification and processing ├── tools/ # Tool implementations │ ├── base.py # Abstract tool interface and registry │ ├── registry.py # Tool discovery and management │ └── [specialized tool modules] ├── models/ # Model providers and management │ ├── manager.py # ModelManager with fallback chains │ └── providers.py # LiteLLM, Gemini, Kluster providers ├── config/ # Configuration management │ └── settings.py # Config, ModelConfig classes └── utils/ # Utilities and helpers ├── exceptions.py # Custom exception hierarchy └── logging.py # Logging configuration ``` ### Core Components **GAIASolver (main.py):** Legacy monolithic solver with 1000+ lines of sophisticated processing logic **GAIASolver (gaia/core/solver.py):** Refactored main orchestrator using dependency injection **QuestionClassifier:** LLM-based intelligent routing with pattern-based fallbacks **GAIA_TOOLS:** 42 specialized tools including enhanced Wikipedia research, chess analysis, Excel processing, and multimedia analysis **ModelManager:** Handles model initialization, fallback chains (Kluster.ai → Gemini → Qwen), and lifecycle management ### Question Type Specialization **Research Questions (92% accuracy):** - Enhanced Wikipedia tools with date-specific searches and Featured Articles integration - Multi-step research coordination with cross-validation - Anti-hallucination safeguards to prevent fabrication **Chess Questions (100% accuracy):** - Universal FEN correction system handling any vision error pattern - Multi-tool consensus system for maximum accuracy - Perfect algebraic notation extraction **YouTube/Multimedia Questions:** - Enhanced URL detection with multiple regex patterns - Forced classification override for YouTube content - Specialized prompts with explicit tool usage instructions **File Processing (100% accuracy):** - Format-specific tools for Excel (.xlsx/.xls), Python (.py), text files - Deterministic Python execution with sandboxed environment - Financial calculation specialization with proper currency formatting ## Environment Configuration ### Required API Keys (set in .env) - `GEMINI_API_KEY` - Primary model (Gemini Flash 2.0) - `HUGGINGFACE_TOKEN` - Fallback model and classification - `KLUSTER_API_KEY` - Optional premium model access ### Model Fallback Chain 1. **Kluster.ai** (Qwen3-235B, Gemma3-27B) - Premium option 2. **Gemini Flash 2.0** - Primary production model 3. **Qwen 2.5-72B** - Reliable fallback via HuggingFace ## Key Design Patterns ### Anti-Hallucination Architecture - **Tool result prioritization**: Always uses exact tool outputs over internal reasoning - **Cross-validation**: Multiple verification methods for critical information - **Source attribution**: Clear tracking and validation of information sources - **Validation rules**: Type-specific answer extraction and verification ### Performance Optimizations - **Fresh agent creation** for each question to avoid token accumulation - **Concurrent processing** support with async operations - **15-minute web cache** for improved response times - **Exponential backoff** for API rate limiting ## File Organization ### Core Files - `main.py` - Legacy monolithic solver (1285 lines) - `main_refactored.py` - Entry point for refactored architecture - `gaia_tools.py` - 42 specialized tools with robust error handling (4887 lines) - `question_classifier.py` - LLM + pattern-based classification system - `app.py` - Production Gradio interface with comprehensive error handling ### Supporting Files - `async_complete_test.py` - Comprehensive async testing infrastructure - `enhanced_wikipedia_tools.py` - Advanced Wikipedia research capabilities - `universal_fen_correction.py` - Chess-specific FEN notation correction - `wikipedia_featured_articles_by_date.py` - Date-specific Wikipedia searches ## Local Configuration Notes - huggingface token can get from secrets in .env