Spaces:
Configuration error
Configuration error
# π€ Advanced GAIA Agents Challenge Solution | |
A comprehensive solution for the [Hugging Face Agents Course Unit 4 GAIA Challenge](https://huggingface.co/learn/agents-course/unit4/hands-on), featuring advanced multimodal AI agents with dynamic RAG capabilities, quantized models for Kaggle compatibility, and both synchronous/asynchronous execution modes. | |
## π Features | |
### π§ Dual Agent Architecture | |
- **Agent 1 (LlamaIndex)**: Advanced multimodal agent with dynamic knowledge base and hybrid reranking | |
- **Agent 2 (Smolagents)**: Gemini-powered agent with BM25 retrieval and observability | |
### Features for Agent 1 | |
### π― Multimodal Capabilities | |
- **BAAI Visualized Embedding**: BGE-M3 based multimodal embeddings running on cuda:1 | |
- **Pixtral 12B Quantized**: FP8/4-bit quantized vision-language model for resource-constrained environments | |
- **Hybrid Retrieval**: Text + visual content processing with ColPali and SentenceTransformer reranking | |
### β‘ Execution Modes | |
- **Asynchronous Mode**: Concurrent question processing for maximum speed | |
- **Kaggle Compatibility**: Optimized for resource-constrained environments | |
### π Advanced RAG System | |
- **Dynamic Knowledge Base**: Automatically updated with web search results | |
- **Multimodal Parsing**: Handles text, images, PDFs, audio, and video files | |
- **Smart Reranking**: Hybrid approach combining text and visual rerankers | |
## ποΈ Architecture | |
``` | |
βββββββββββββββββββββββββββββββββββββββββββββββ | |
β APP β | |
β (Async/Sync Modes) β | |
βββββββββββββββββββ¬ββββββββββββββββββββββββββββ | |
β | |
ββββββββββ΄βββββββββ | |
β β | |
ββββββΌβββββ ββββββΌβββββ | |
βAgent 1 β βAgent 2 β | |
βLlamaIdx β βSmolagentβ | |
ββββββ¬βββββ ββββββ¬βββββ | |
β β | |
ββββββΌβββββ ββββββΌβββββ | |
βDynamic β βBM25 + β | |
βRAG + β βLangfuse β | |
βHybrid β βObserv. β | |
βRerank β β β | |
βββββββββββ βββββββββββ | |
``` | |
## π Quick Start | |
### Prerequisites | |
### Installation | |
1. **Clone the repository**: | |
```bash | |
git clone https://github.com/yourusername/gaia-agents-challenge | |
cd gaia-agents-challenge | |
``` | |
2. **Install FlagEmbedding with visual support**: | |
```bash | |
git clone https://github.com/FlagOpen/FlagEmbedding.git | |
cd FlagEmbedding/research/visual_bge | |
pip install -e . | |
cd ../../.. | |
``` | |
3. **Install additional dependencies**: | |
#### For Agent 1: | |
```bash | |
pip install -r requirements.txt | |
``` | |
#### For Agent 2: | |
```bash | |
pip install -r requirements2.txt | |
``` | |
4. **Set environment variables**: | |
```bash | |
export GOOGLE_API_KEY="your_gemini_api_key" | |
export HUGGINGFACEHUB_API_TOKEN="your_hf_token" | |
export LANGFUSE_PUBLIC_KEY="your_langfuse_public_key" # Optional | |
export LANGFUSE_SECRET_KEY="your_langfuse_secret_key" # Optional | |
``` | |
### Usage | |
```bash | |
# LlamaIndex Agent | |
python agent.py | |
# Smolagents Agent | |
python agent2.py | |
``` | |
## π Project Structure | |
``` | |
βββ agent.py # LlamaIndex-based agent with dynamic RAG | |
βββ agent2.py # Smolagents-based agent with observability | |
βββ appasync.py # Original async Gradio interface | |
βββ app.py # Original sync Gradio interface | |
βββ custom_models.py # Custom model implementations | |
βββ requirements.txt # Python dependencies | |
βββ README.md # This file | |
``` | |
## π§ͺ Testing | |
### Run Individual Components | |
```bash | |
# Test BAAI embedding | |
python -c "from custom_models import BaaiMultimodalEmbedding; print('BAAI OK')" | |
# Test Pixtral quantized | |
python -c "from custom_models import PixtralQuantizedLLM; print('Pixtral OK')" | |
# Test agents | |
python agent.py | |
python agent2.py | |
``` | |
### Run GAIA Evaluation | |
```bash | |
# Through the web interface | |
python app.py | |
# Or programmatically | |
python -c " | |
from agent2 import GAIAAgent | |
agent = GAIAAgent() | |
result = agent.solve_gaia_question({'Question': 'Test question', 'task_id': 'test'}) | |
print(result) | |
" | |
``` | |
## π§ Customization | |
### Adding New Models | |
1. Create a new class in `custom_models.py` | |
2. Implement the required interfaces | |
3. Update the agent configuration | |
### Modifying RAG Behavior | |
- Edit `DynamicQueryEngineManager` in `agent.py` | |
- Adjust reranking strategies in `HybridReranker` | |
- Configure search parameters in `enhanced_web_search_tool` | |
### UI Customization | |
- Modify `app_unified.py` for interface changes | |
- Add new execution modes | |
- Integrate additional observability tools | |
## π Troubleshooting | |
### Common Issues | |
#### Model Loading Failures | |
- Check internet connectivity for model downloads | |
- Verify HuggingFace token permissions | |
- Clear model cache: `rm -rf ~/.cache/huggingface/` | |
#### Visual BGE Import Errors | |
```bash | |
# Ensure proper installation | |
cd FlagEmbedding/research/visual_bge | |
pip install -e . | |
``` | |
## π References | |
- [GAIA Benchmark](https://huggingface.co/datasets/gaia-benchmark/GAIA) | |
- [LlamaIndex](https://github.com/run-llama/llama_index) | |
- [BGE Models](https://github.com/FlagOpen/FlagEmbedding) | |
- [Gradio](https://github.com/gradio-app/gradio) |