Orel MAZOR
readme
3cce64e

πŸ€– Advanced GAIA Agents Challenge Solution

A comprehensive solution for the Hugging Face Agents Course Unit 4 GAIA Challenge, featuring advanced multimodal AI agents with dynamic RAG capabilities, quantized models for Kaggle compatibility, and both synchronous/asynchronous execution modes.

🌟 Features

🧠 Dual Agent Architecture

  • Agent 1 (LlamaIndex): Advanced multimodal agent with dynamic knowledge base and hybrid reranking
  • Agent 2 (Smolagents): Gemini-powered agent with BM25 retrieval and observability

Features for Agent 1

🎯 Multimodal Capabilities

  • BAAI Visualized Embedding: BGE-M3 based multimodal embeddings running on cuda:1
  • Pixtral 12B Quantized: FP8/4-bit quantized vision-language model for resource-constrained environments
  • Hybrid Retrieval: Text + visual content processing with ColPali and SentenceTransformer reranking

⚑ Execution Modes

  • Asynchronous Mode: Concurrent question processing for maximum speed
  • Kaggle Compatibility: Optimized for resource-constrained environments

πŸ” Advanced RAG System

  • Dynamic Knowledge Base: Automatically updated with web search results
  • Multimodal Parsing: Handles text, images, PDFs, audio, and video files
  • Smart Reranking: Hybrid approach combining text and visual rerankers

πŸ—οΈ Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                  APP                        β”‚
β”‚            (Async/Sync Modes)               β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                  β”‚
         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”
         β”‚                 β”‚
    β”Œβ”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”       β”Œβ”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”
    β”‚Agent 1  β”‚       β”‚Agent 2  β”‚
    β”‚LlamaIdx β”‚       β”‚Smolagentβ”‚
    β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜       β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜
         β”‚                 β”‚
    β”Œβ”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”       β”Œβ”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”
    β”‚Dynamic  β”‚       β”‚BM25 +   β”‚
    β”‚RAG +    β”‚       β”‚Langfuse β”‚
    β”‚Hybrid   β”‚       β”‚Observ.  β”‚
    β”‚Rerank   β”‚       β”‚         β”‚
    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜       β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸš€ Quick Start

Prerequisites

Installation

  1. Clone the repository:
git clone https://github.com/yourusername/gaia-agents-challenge
cd gaia-agents-challenge
  1. Install FlagEmbedding with visual support:
git clone https://github.com/FlagOpen/FlagEmbedding.git
cd FlagEmbedding/research/visual_bge
pip install -e .
cd ../../..
  1. Install additional dependencies:

For Agent 1:

pip install -r requirements.txt

For Agent 2:

pip install -r requirements2.txt
  1. Set environment variables:
export GOOGLE_API_KEY="your_gemini_api_key"
export HUGGINGFACEHUB_API_TOKEN="your_hf_token"
export LANGFUSE_PUBLIC_KEY="your_langfuse_public_key"  # Optional
export LANGFUSE_SECRET_KEY="your_langfuse_secret_key"  # Optional

Usage

# LlamaIndex Agent
python agent.py

# Smolagents Agent
python agent2.py

πŸ“ Project Structure

β”œβ”€β”€ agent.py                 # LlamaIndex-based agent with dynamic RAG
β”œβ”€β”€ agent2.py               # Smolagents-based agent with observability
β”œβ”€β”€ appasync.py             # Original async Gradio interface
β”œβ”€β”€ app.py                  # Original sync Gradio interface
β”œβ”€β”€ custom_models.py        # Custom model implementations
β”œβ”€β”€ requirements.txt        # Python dependencies
β”œβ”€β”€ README.md              # This file

πŸ§ͺ Testing

Run Individual Components

# Test BAAI embedding
python -c "from custom_models import BaaiMultimodalEmbedding; print('BAAI OK')"

# Test Pixtral quantized
python -c "from custom_models import PixtralQuantizedLLM; print('Pixtral OK')"

# Test agents
python agent.py
python agent2.py

Run GAIA Evaluation

# Through the web interface
python app.py

# Or programmatically
python -c "
from agent2 import GAIAAgent
agent = GAIAAgent()
result = agent.solve_gaia_question({'Question': 'Test question', 'task_id': 'test'})
print(result)
"

πŸ”§ Customization

Adding New Models

  1. Create a new class in custom_models.py
  2. Implement the required interfaces
  3. Update the agent configuration

Modifying RAG Behavior

  • Edit DynamicQueryEngineManager in agent.py
  • Adjust reranking strategies in HybridReranker
  • Configure search parameters in enhanced_web_search_tool

UI Customization

  • Modify app_unified.py for interface changes
  • Add new execution modes
  • Integrate additional observability tools

πŸ› Troubleshooting

Common Issues

Model Loading Failures

  • Check internet connectivity for model downloads
  • Verify HuggingFace token permissions
  • Clear model cache: rm -rf ~/.cache/huggingface/

Visual BGE Import Errors

# Ensure proper installation
cd FlagEmbedding/research/visual_bge
pip install -e .

πŸ”— References