Spaces:
Configuration error
Configuration error
π€ Advanced GAIA Agents Challenge Solution
A comprehensive solution for the Hugging Face Agents Course Unit 4 GAIA Challenge, featuring advanced multimodal AI agents with dynamic RAG capabilities, quantized models for Kaggle compatibility, and both synchronous/asynchronous execution modes.
π Features
π§ Dual Agent Architecture
- Agent 1 (LlamaIndex): Advanced multimodal agent with dynamic knowledge base and hybrid reranking
- Agent 2 (Smolagents): Gemini-powered agent with BM25 retrieval and observability
Features for Agent 1
π― Multimodal Capabilities
- BAAI Visualized Embedding: BGE-M3 based multimodal embeddings running on cuda:1
- Pixtral 12B Quantized: FP8/4-bit quantized vision-language model for resource-constrained environments
- Hybrid Retrieval: Text + visual content processing with ColPali and SentenceTransformer reranking
β‘ Execution Modes
- Asynchronous Mode: Concurrent question processing for maximum speed
- Kaggle Compatibility: Optimized for resource-constrained environments
π Advanced RAG System
- Dynamic Knowledge Base: Automatically updated with web search results
- Multimodal Parsing: Handles text, images, PDFs, audio, and video files
- Smart Reranking: Hybrid approach combining text and visual rerankers
ποΈ Architecture
βββββββββββββββββββββββββββββββββββββββββββββββ
β APP β
β (Async/Sync Modes) β
βββββββββββββββββββ¬ββββββββββββββββββββββββββββ
β
ββββββββββ΄βββββββββ
β β
ββββββΌβββββ ββββββΌβββββ
βAgent 1 β βAgent 2 β
βLlamaIdx β βSmolagentβ
ββββββ¬βββββ ββββββ¬βββββ
β β
ββββββΌβββββ ββββββΌβββββ
βDynamic β βBM25 + β
βRAG + β βLangfuse β
βHybrid β βObserv. β
βRerank β β β
βββββββββββ βββββββββββ
π Quick Start
Prerequisites
Installation
- Clone the repository:
git clone https://github.com/yourusername/gaia-agents-challenge
cd gaia-agents-challenge
- Install FlagEmbedding with visual support:
git clone https://github.com/FlagOpen/FlagEmbedding.git
cd FlagEmbedding/research/visual_bge
pip install -e .
cd ../../..
- Install additional dependencies:
For Agent 1:
pip install -r requirements.txt
For Agent 2:
pip install -r requirements2.txt
- Set environment variables:
export GOOGLE_API_KEY="your_gemini_api_key"
export HUGGINGFACEHUB_API_TOKEN="your_hf_token"
export LANGFUSE_PUBLIC_KEY="your_langfuse_public_key" # Optional
export LANGFUSE_SECRET_KEY="your_langfuse_secret_key" # Optional
Usage
# LlamaIndex Agent
python agent.py
# Smolagents Agent
python agent2.py
π Project Structure
βββ agent.py # LlamaIndex-based agent with dynamic RAG
βββ agent2.py # Smolagents-based agent with observability
βββ appasync.py # Original async Gradio interface
βββ app.py # Original sync Gradio interface
βββ custom_models.py # Custom model implementations
βββ requirements.txt # Python dependencies
βββ README.md # This file
π§ͺ Testing
Run Individual Components
# Test BAAI embedding
python -c "from custom_models import BaaiMultimodalEmbedding; print('BAAI OK')"
# Test Pixtral quantized
python -c "from custom_models import PixtralQuantizedLLM; print('Pixtral OK')"
# Test agents
python agent.py
python agent2.py
Run GAIA Evaluation
# Through the web interface
python app.py
# Or programmatically
python -c "
from agent2 import GAIAAgent
agent = GAIAAgent()
result = agent.solve_gaia_question({'Question': 'Test question', 'task_id': 'test'})
print(result)
"
π§ Customization
Adding New Models
- Create a new class in
custom_models.py
- Implement the required interfaces
- Update the agent configuration
Modifying RAG Behavior
- Edit
DynamicQueryEngineManager
inagent.py
- Adjust reranking strategies in
HybridReranker
- Configure search parameters in
enhanced_web_search_tool
UI Customization
- Modify
app_unified.py
for interface changes - Add new execution modes
- Integrate additional observability tools
π Troubleshooting
Common Issues
Model Loading Failures
- Check internet connectivity for model downloads
- Verify HuggingFace token permissions
- Clear model cache:
rm -rf ~/.cache/huggingface/
Visual BGE Import Errors
# Ensure proper installation
cd FlagEmbedding/research/visual_bge
pip install -e .