Orel MAZOR
readme
3cce64e
# πŸ€– Advanced GAIA Agents Challenge Solution
A comprehensive solution for the [Hugging Face Agents Course Unit 4 GAIA Challenge](https://huggingface.co/learn/agents-course/unit4/hands-on), featuring advanced multimodal AI agents with dynamic RAG capabilities, quantized models for Kaggle compatibility, and both synchronous/asynchronous execution modes.
## 🌟 Features
### 🧠 Dual Agent Architecture
- **Agent 1 (LlamaIndex)**: Advanced multimodal agent with dynamic knowledge base and hybrid reranking
- **Agent 2 (Smolagents)**: Gemini-powered agent with BM25 retrieval and observability
### Features for Agent 1
### 🎯 Multimodal Capabilities
- **BAAI Visualized Embedding**: BGE-M3 based multimodal embeddings running on cuda:1
- **Pixtral 12B Quantized**: FP8/4-bit quantized vision-language model for resource-constrained environments
- **Hybrid Retrieval**: Text + visual content processing with ColPali and SentenceTransformer reranking
### ⚑ Execution Modes
- **Asynchronous Mode**: Concurrent question processing for maximum speed
- **Kaggle Compatibility**: Optimized for resource-constrained environments
### πŸ” Advanced RAG System
- **Dynamic Knowledge Base**: Automatically updated with web search results
- **Multimodal Parsing**: Handles text, images, PDFs, audio, and video files
- **Smart Reranking**: Hybrid approach combining text and visual rerankers
## πŸ—οΈ Architecture
```
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ APP β”‚
β”‚ (Async/Sync Modes) β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ β”‚
β”Œβ”€β”€β”€β”€β–Όβ”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”
β”‚Agent 1 β”‚ β”‚Agent 2 β”‚
β”‚LlamaIdx β”‚ β”‚Smolagentβ”‚
β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜
β”‚ β”‚
β”Œβ”€β”€β”€β”€β–Όβ”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”
β”‚Dynamic β”‚ β”‚BM25 + β”‚
β”‚RAG + β”‚ β”‚Langfuse β”‚
β”‚Hybrid β”‚ β”‚Observ. β”‚
β”‚Rerank β”‚ β”‚ β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```
## πŸš€ Quick Start
### Prerequisites
### Installation
1. **Clone the repository**:
```bash
git clone https://github.com/yourusername/gaia-agents-challenge
cd gaia-agents-challenge
```
2. **Install FlagEmbedding with visual support**:
```bash
git clone https://github.com/FlagOpen/FlagEmbedding.git
cd FlagEmbedding/research/visual_bge
pip install -e .
cd ../../..
```
3. **Install additional dependencies**:
#### For Agent 1:
```bash
pip install -r requirements.txt
```
#### For Agent 2:
```bash
pip install -r requirements2.txt
```
4. **Set environment variables**:
```bash
export GOOGLE_API_KEY="your_gemini_api_key"
export HUGGINGFACEHUB_API_TOKEN="your_hf_token"
export LANGFUSE_PUBLIC_KEY="your_langfuse_public_key" # Optional
export LANGFUSE_SECRET_KEY="your_langfuse_secret_key" # Optional
```
### Usage
```bash
# LlamaIndex Agent
python agent.py
# Smolagents Agent
python agent2.py
```
## πŸ“ Project Structure
```
β”œβ”€β”€ agent.py # LlamaIndex-based agent with dynamic RAG
β”œβ”€β”€ agent2.py # Smolagents-based agent with observability
β”œβ”€β”€ appasync.py # Original async Gradio interface
β”œβ”€β”€ app.py # Original sync Gradio interface
β”œβ”€β”€ custom_models.py # Custom model implementations
β”œβ”€β”€ requirements.txt # Python dependencies
β”œβ”€β”€ README.md # This file
```
## πŸ§ͺ Testing
### Run Individual Components
```bash
# Test BAAI embedding
python -c "from custom_models import BaaiMultimodalEmbedding; print('BAAI OK')"
# Test Pixtral quantized
python -c "from custom_models import PixtralQuantizedLLM; print('Pixtral OK')"
# Test agents
python agent.py
python agent2.py
```
### Run GAIA Evaluation
```bash
# Through the web interface
python app.py
# Or programmatically
python -c "
from agent2 import GAIAAgent
agent = GAIAAgent()
result = agent.solve_gaia_question({'Question': 'Test question', 'task_id': 'test'})
print(result)
"
```
## πŸ”§ Customization
### Adding New Models
1. Create a new class in `custom_models.py`
2. Implement the required interfaces
3. Update the agent configuration
### Modifying RAG Behavior
- Edit `DynamicQueryEngineManager` in `agent.py`
- Adjust reranking strategies in `HybridReranker`
- Configure search parameters in `enhanced_web_search_tool`
### UI Customization
- Modify `app_unified.py` for interface changes
- Add new execution modes
- Integrate additional observability tools
## πŸ› Troubleshooting
### Common Issues
#### Model Loading Failures
- Check internet connectivity for model downloads
- Verify HuggingFace token permissions
- Clear model cache: `rm -rf ~/.cache/huggingface/`
#### Visual BGE Import Errors
```bash
# Ensure proper installation
cd FlagEmbedding/research/visual_bge
pip install -e .
```
## πŸ”— References
- [GAIA Benchmark](https://huggingface.co/datasets/gaia-benchmark/GAIA)
- [LlamaIndex](https://github.com/run-llama/llama_index)
- [BGE Models](https://github.com/FlagOpen/FlagEmbedding)
- [Gradio](https://github.com/gradio-app/gradio)