# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## Project Overview

This is a **production-ready GAIA benchmark AI agent** achieving 85% accuracy through a sophisticated multi-agent architecture. The system has been **fully refactored** into a modular, maintainable architecture that specializes in complex question answering across multimedia, research, file processing, chess analysis, and mathematical reasoning domains.

## Development Commands

### Setup and Installation
```bash
# Install dependencies
pip install -r requirements.txt

# Test API key configuration
python test_api_keys.py

# Verify core functionality
python -c "from main import GAIASolver; print('✅ Core GAIASolver available')"
```

### Running the System
```bash
# Run legacy monolithic solver
python main.py

# Run refactored modular solver (recommended)
python main_refactored.py

# Run Gradio web interface
python app.py
```

### Testing Commands
```bash
# Comprehensive async testing
python async_complete_test.py

# Test question classification
python test_improved_classification.py
python final_classification_test.py

# Test YouTube functionality
python direct_youtube_test.py
python simple_youtube_test.py
python test_youtube_question.py

# Test individual components
python -c "from gaia_tools import GAIA_TOOLS; print(f'Available tools: {len(GAIA_TOOLS)}')"
python -c "from question_classifier import QuestionClassifier; c = QuestionClassifier(); print('✅ Classifier ready')"
```

## Architecture Overview

### Dual Architecture Design

This project maintains both **legacy monolithic** and **refactored modular** architectures:

**Legacy Architecture (main.py):**
- Monolithic 1285-line solver with all functionality integrated
- Comprehensive tool collection in gaia_tools.py (4887 lines)
- Single-file approach for rapid development and deployment

**Refactored Architecture (gaia/ package):**
```
gaia/
├── core/           # Main solver logic
│   ├── solver.py           # GAIASolver main class
│   ├── answer_extractor.py # Specialized answer extraction classes
│   └── question_processor.py # Question classification and processing
├── tools/          # Tool implementations  
│   ├── base.py            # Abstract tool interface and registry
│   ├── registry.py        # Tool discovery and management
│   └── [specialized tool modules]
├── models/         # Model providers and management
│   ├── manager.py         # ModelManager with fallback chains
│   └── providers.py       # LiteLLM, Gemini, Kluster providers
├── config/         # Configuration management
│   └── settings.py        # Config, ModelConfig classes
└── utils/          # Utilities and helpers
    ├── exceptions.py      # Custom exception hierarchy
    └── logging.py         # Logging configuration
```

### Core Components

**GAIASolver (main.py):** Legacy monolithic solver with 1000+ lines of sophisticated processing logic
**GAIASolver (gaia/core/solver.py):** Refactored main orchestrator using dependency injection
**QuestionClassifier:** LLM-based intelligent routing with pattern-based fallbacks
**GAIA_TOOLS:** 42 specialized tools including enhanced Wikipedia research, chess analysis, Excel processing, and multimedia analysis
**ModelManager:** Handles model initialization, fallback chains (Kluster.ai → Gemini → Qwen), and lifecycle management

### Question Type Specialization

**Research Questions (92% accuracy):**
- Enhanced Wikipedia tools with date-specific searches and Featured Articles integration
- Multi-step research coordination with cross-validation
- Anti-hallucination safeguards to prevent fabrication

**Chess Questions (100% accuracy):**
- Universal FEN correction system handling any vision error pattern
- Multi-tool consensus system for maximum accuracy
- Perfect algebraic notation extraction

**YouTube/Multimedia Questions:**
- Enhanced URL detection with multiple regex patterns
- Forced classification override for YouTube content
- Specialized prompts with explicit tool usage instructions

**File Processing (100% accuracy):**
- Format-specific tools for Excel (.xlsx/.xls), Python (.py), text files
- Deterministic Python execution with sandboxed environment
- Financial calculation specialization with proper currency formatting

## Environment Configuration

### Required API Keys (set in .env)
- `GEMINI_API_KEY` - Primary model (Gemini Flash 2.0)
- `HUGGINGFACE_TOKEN` - Fallback model and classification
- `KLUSTER_API_KEY` - Optional premium model access

### Model Fallback Chain
1. **Kluster.ai** (Qwen3-235B, Gemma3-27B) - Premium option
2. **Gemini Flash 2.0** - Primary production model
3. **Qwen 2.5-72B** - Reliable fallback via HuggingFace

## Key Design Patterns

### Anti-Hallucination Architecture
- **Tool result prioritization**: Always uses exact tool outputs over internal reasoning
- **Cross-validation**: Multiple verification methods for critical information  
- **Source attribution**: Clear tracking and validation of information sources
- **Validation rules**: Type-specific answer extraction and verification

### Performance Optimizations
- **Fresh agent creation** for each question to avoid token accumulation
- **Concurrent processing** support with async operations  
- **15-minute web cache** for improved response times
- **Exponential backoff** for API rate limiting

## File Organization

### Core Files
- `main.py` - Legacy monolithic solver (1285 lines)
- `main_refactored.py` - Entry point for refactored architecture
- `gaia_tools.py` - 42 specialized tools with robust error handling (4887 lines)
- `question_classifier.py` - LLM + pattern-based classification system
- `app.py` - Production Gradio interface with comprehensive error handling

### Supporting Files
- `async_complete_test.py` - Comprehensive async testing infrastructure
- `enhanced_wikipedia_tools.py` - Advanced Wikipedia research capabilities
- `universal_fen_correction.py` - Chess-specific FEN notation correction
- `wikipedia_featured_articles_by_date.py` - Date-specific Wikipedia searches

## Local Configuration Notes
- huggingface token can get from secrets in .env