File size: 5,595 Bytes
3cce64e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
# πŸ€– Advanced GAIA Agents Challenge Solution

A comprehensive solution for the [Hugging Face Agents Course Unit 4 GAIA Challenge](https://huggingface.co/learn/agents-course/unit4/hands-on), featuring advanced multimodal AI agents with dynamic RAG capabilities, quantized models for Kaggle compatibility, and both synchronous/asynchronous execution modes.

## 🌟 Features

### 🧠 Dual Agent Architecture
- **Agent 1 (LlamaIndex)**: Advanced multimodal agent with dynamic knowledge base and hybrid reranking
- **Agent 2 (Smolagents)**: Gemini-powered agent with BM25 retrieval and observability

### Features for Agent 1
### 🎯 Multimodal Capabilities
- **BAAI Visualized Embedding**: BGE-M3 based multimodal embeddings running on cuda:1
- **Pixtral 12B Quantized**: FP8/4-bit quantized vision-language model for resource-constrained environments
- **Hybrid Retrieval**: Text + visual content processing with ColPali and SentenceTransformer reranking

### ⚑ Execution Modes
- **Asynchronous Mode**: Concurrent question processing for maximum speed
- **Kaggle Compatibility**: Optimized for resource-constrained environments

### πŸ” Advanced RAG System
- **Dynamic Knowledge Base**: Automatically updated with web search results
- **Multimodal Parsing**: Handles text, images, PDFs, audio, and video files
- **Smart Reranking**: Hybrid approach combining text and visual rerankers

## πŸ—οΈ Architecture

```
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                  APP                        β”‚
β”‚            (Async/Sync Modes)               β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                  β”‚
         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”
         β”‚                 β”‚
    β”Œβ”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”       β”Œβ”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”
    β”‚Agent 1  β”‚       β”‚Agent 2  β”‚
    β”‚LlamaIdx β”‚       β”‚Smolagentβ”‚
    β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜       β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜
         β”‚                 β”‚
    β”Œβ”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”       β”Œβ”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”
    β”‚Dynamic  β”‚       β”‚BM25 +   β”‚
    β”‚RAG +    β”‚       β”‚Langfuse β”‚
    β”‚Hybrid   β”‚       β”‚Observ.  β”‚
    β”‚Rerank   β”‚       β”‚         β”‚
    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜       β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```

## πŸš€ Quick Start

### Prerequisites

### Installation

1. **Clone the repository**:
```bash
git clone https://github.com/yourusername/gaia-agents-challenge
cd gaia-agents-challenge
```

2. **Install FlagEmbedding with visual support**:
```bash
git clone https://github.com/FlagOpen/FlagEmbedding.git
cd FlagEmbedding/research/visual_bge
pip install -e .
cd ../../..
```

3. **Install additional dependencies**:
#### For Agent 1: 
```bash
pip install -r requirements.txt
```
#### For Agent 2: 
```bash
pip install -r requirements2.txt
```


4. **Set environment variables**:
```bash
export GOOGLE_API_KEY="your_gemini_api_key"
export HUGGINGFACEHUB_API_TOKEN="your_hf_token"
export LANGFUSE_PUBLIC_KEY="your_langfuse_public_key"  # Optional
export LANGFUSE_SECRET_KEY="your_langfuse_secret_key"  # Optional
```

### Usage

```bash
# LlamaIndex Agent
python agent.py

# Smolagents Agent
python agent2.py
```

## πŸ“ Project Structure

```
β”œβ”€β”€ agent.py                 # LlamaIndex-based agent with dynamic RAG
β”œβ”€β”€ agent2.py               # Smolagents-based agent with observability
β”œβ”€β”€ appasync.py             # Original async Gradio interface
β”œβ”€β”€ app.py                  # Original sync Gradio interface
β”œβ”€β”€ custom_models.py        # Custom model implementations
β”œβ”€β”€ requirements.txt        # Python dependencies
β”œβ”€β”€ README.md              # This file
```

## πŸ§ͺ Testing

### Run Individual Components
```bash
# Test BAAI embedding
python -c "from custom_models import BaaiMultimodalEmbedding; print('BAAI OK')"

# Test Pixtral quantized
python -c "from custom_models import PixtralQuantizedLLM; print('Pixtral OK')"

# Test agents
python agent.py
python agent2.py
```

### Run GAIA Evaluation
```bash
# Through the web interface
python app.py

# Or programmatically
python -c "
from agent2 import GAIAAgent
agent = GAIAAgent()
result = agent.solve_gaia_question({'Question': 'Test question', 'task_id': 'test'})
print(result)
"
```

## πŸ”§ Customization

### Adding New Models
1. Create a new class in `custom_models.py`
2. Implement the required interfaces
3. Update the agent configuration

### Modifying RAG Behavior
- Edit `DynamicQueryEngineManager` in `agent.py`
- Adjust reranking strategies in `HybridReranker`
- Configure search parameters in `enhanced_web_search_tool`

### UI Customization
- Modify `app_unified.py` for interface changes
- Add new execution modes
- Integrate additional observability tools

## πŸ› Troubleshooting

### Common Issues

#### Model Loading Failures
- Check internet connectivity for model downloads
- Verify HuggingFace token permissions
- Clear model cache: `rm -rf ~/.cache/huggingface/`

#### Visual BGE Import Errors
```bash
# Ensure proper installation
cd FlagEmbedding/research/visual_bge
pip install -e .
```

## πŸ”— References

- [GAIA Benchmark](https://huggingface.co/datasets/gaia-benchmark/GAIA)
- [LlamaIndex](https://github.com/run-llama/llama_index)
- [BGE Models](https://github.com/FlagOpen/FlagEmbedding)
- [Gradio](https://github.com/gradio-app/gradio)