File size: 5,247 Bytes
7b642c4
 
 
 
 
 
2531dc8
7b642c4
 
cabc2ec
7b642c4
 
64a54ba
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2705160
 
2531dc8
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
---
title: GAIA Agent Project
emoji: 🌱
colorFrom: green
colorTo: blue
sdk: gradio
sdk_version: 5.34.0
app_file: app.py
pinned: false
hf_oauth: true
---

# GAIA Agent Project

AI agent for the GAIA benchmark, built for the Hugging Face Agents Course Certificate of Excellence.

## Overview

This project implements an AI agent that can solve tasks from the GAIA (General AI Assistants) benchmark. The agent uses xAI's Grok API for reasoning and includes tools for web search, file handling, and mathematical calculations.

## Goal

Achieve β‰₯30% score on the GAIA benchmark to earn the Certificate of Excellence from the Hugging Face Agents Course.

## Project Structure

```
β”œβ”€β”€ agent.py          # Main GAIA agent implementation
β”œβ”€β”€ tools.py          # Tool implementations (web search, file handling)
β”œβ”€β”€ evaluate.py       # Evaluation script and scoring
β”œβ”€β”€ test_agent.py     # Test suite for verification
β”œβ”€β”€ requirements.txt  # Python dependencies
β”œβ”€β”€ README.md         # This file
β”œβ”€β”€ .gitignore        # Git ignore rules
└── submission.jsonl  # Generated submission file
```

## Setup

### 1. Install Dependencies

```bash
pip install -r requirements.txt
```

### 2. API Configuration

The agent uses xAI's Grok API. The API key is already configured in the code for this project.

### 3. Optional: SerpAPI for Enhanced Web Search

For better web search results, you can sign up for SerpAPI:
1. Visit https://serpapi.com/ and create an account
2. Get your API key
3. Update the `serpapi_key` in `agent.py`

## Usage

### Quick Test

Run the test suite to verify everything is working:

```bash
python test_agent.py
```

### Full Evaluation

Run the full evaluation on sample tasks:

```bash
python evaluate.py
```

Run with maximum number of tasks limit:

```bash
python evaluate.py --max-tasks 10
```

Run with custom dataset:

```bash
python evaluate.py --dataset path/to/gaia_dataset.jsonl
```

## Components

### Agent (`agent.py`)

- **GAIAAgent**: Main agent class that processes GAIA tasks
- **call_grok()**: Interface to xAI Grok API with retry logic
- **process_task()**: Main task processing pipeline
- **extract_final_answer()**: Extracts formatted answers from responses

### Tools (`tools.py`)

- **web_search()**: Web search with SerpAPI fallback to DuckDuckGo
- **read_file()**: Handles text, CSV, and image files
- **execute_code()**: Safe Python code execution (limited)
- **calculate_simple_math()**: Basic mathematical calculations

### Evaluation (`evaluate.py`)

- **evaluate_agent()**: Main evaluation function
- **load_gaia_dataset()**: Loads GAIA dataset from JSON/JSONL
- **normalize_answer()**: Normalizes answers for comparison
- **create_sample_dataset()**: Creates sample tasks for testing

## Features

- βœ… xAI Grok API integration with retry logic
- βœ… Web search capabilities (SerpAPI + DuckDuckGo fallback)
- βœ… Multi-format file handling (text, CSV, images)
- βœ… OCR support for image-based tasks (with pytesseract)
- βœ… Safe code execution environment
- βœ… Comprehensive evaluation system
- βœ… JSONL submission format generation
- βœ… Progress tracking and scoring

## GAIA Task Types

The agent handles different GAIA task levels:

- **Level 1**: Simple questions requiring basic knowledge
- **Level 2**: Multi-step reasoning tasks
- **Level 3**: Complex tasks involving files, images, or code

## Sample Tasks

The evaluation includes sample tasks like:

- Basic arithmetic: "What is 15 + 27?"
- General knowledge: "What is the capital of France?"
- Date calculations: "How many days are in a leap year?"
- Multi-step math: "What is 2 * 6 * 7?"
- Historical facts: "What year did World War II end?"

## Scoring

- Target: β‰₯30% accuracy for Certificate of Excellence
- Current leaderboard top score: ~76%
- Evaluation provides detailed per-task feedback
- Generates `submission.jsonl` in required format

## Troubleshooting

### API Issues
- Verify internet connection
- Check API key validity
- Monitor rate limits

### Import Errors
- Ensure all dependencies are installed: `pip install -r requirements.txt`
- For OCR: Install system dependency `tesseract-ocr`

### File Reading Issues
- Check file paths and permissions
- Verify file formats are supported

## Development

### Testing
Run the test suite before making changes:
```bash
python test_agent.py
```

### Adding New Tools
1. Implement the tool function in `tools.py`
2. Import and use in `agent.py`
3. Add tests in `test_agent.py`

### Improving Performance
- Optimize prompts for better reasoning
- Add more sophisticated web search
- Enhance file processing capabilities
- Implement better answer extraction

## Submission

1. Run evaluation: `python evaluate.py`
2. Upload `submission.jsonl` to the Hugging Face leaderboard
3. Verify score β‰₯30% for certificate eligibility

## Resources

- [GAIA Benchmark](https://github.com/gaia-benchmark/GAIA)
- [xAI API Documentation](https://x.ai/api)
- [Hugging Face Agents Course](https://huggingface.co/docs)
- [SerpAPI](https://serpapi.com/)

## License

This project is created for educational purposes as part of the Hugging Face Agents Course.

---

**Good luck achieving the 30% score for your Certificate of Excellence! πŸŽ‰**