File size: 2,977 Bytes
adadaa5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
# General Agent with Audio, Image, and File Processing

This is a general-purpose agent built with LangChain/LangGraph that includes advanced tools for processing various types of data.

## Tool Categories

The agent includes tools for working with:

- **Web Content**: Search web pages, news articles, Wikipedia, ArXiv papers
- **Files**: Read PDFs, DOCXs, Excel files
- **Media**: Process images, transcribe audio, extract text from YouTube videos
- **Code**: Analyze code structure, read code files, analyze functions
- **Math**: Basic math operations

## Testing Your Installation

Before using the agent, you should check if all dependencies are installed and tools are working correctly:

1. **Check and install dependencies**:
   ```

   python fix_dependencies.py

   ```
   This script will check for missing Python packages and system dependencies.

2. **Test all tools**:
   ```

   python test_all_tools.py

   ```
   This will test all tools and report any issues.

3. **Test image and audio processing specifically**:
   ```

   python test_image_audio.py

   ```
   This focuses on testing media processing tools and provides detailed troubleshooting steps.

## System Requirements

For full functionality, you'll need:

- **Python 3.8+**
- **Tesseract OCR** (for image text extraction)
- **FFmpeg** (for audio processing)
- **Internet connection** (for web search, YouTube, etc.)
- **API Keys**: GROQ_API_KEY must be set in .env file or environment variables

## Agent Structure

This agent uses a streamlined 3-node graph structure:

1. **PerceptionAgent**: Handles web searches, looking up information
2. **ActionAgent**: Performs calculations, file operations, code analysis
3. **EvaluationAgent**: Ensures answers are properly formatted

## Common Issues

If you encounter issues:

1. **Web Scraping Errors**: The agent has robust error handling for 403 Forbidden errors
2. **Audio Processing Errors**: Make sure FFmpeg is installed and in your PATH
3. **Image Processing Errors**: Make sure Tesseract OCR is installed and in your PATH
4. **GROQ API Rate Limits**: The agent includes automatic rate limiting and retry mechanisms

## Running GAIA Tests

To test if the agent can properly handle factual questions with GAIA format:

```

python test_factual_questions.py

```

## Testing Individual Tools

```python

from agent import multiply, add, subtract, divide, modulus  # Math tools

from agent import web_search, wiki_search  # Web tools

from agent import read_text_from_pdf, read_text_from_docx  # Document tools

from agent import image_processing  # Image tools

from agent import transcribe_audio  # Audio tools

from agent import analyze_code, read_code_file  # Code tools



# Test a tool directly

result = multiply(5, 7)

print(result)  # 35



# Process an image

image_description = image_processing("Describe this image", "path/to/image.jpg")

print(image_description)

```