Spaces:
No application file
No application file
File size: 2,977 Bytes
adadaa5 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 |
# General Agent with Audio, Image, and File Processing
This is a general-purpose agent built with LangChain/LangGraph that includes advanced tools for processing various types of data.
## Tool Categories
The agent includes tools for working with:
- **Web Content**: Search web pages, news articles, Wikipedia, ArXiv papers
- **Files**: Read PDFs, DOCXs, Excel files
- **Media**: Process images, transcribe audio, extract text from YouTube videos
- **Code**: Analyze code structure, read code files, analyze functions
- **Math**: Basic math operations
## Testing Your Installation
Before using the agent, you should check if all dependencies are installed and tools are working correctly:
1. **Check and install dependencies**:
```
python fix_dependencies.py
```
This script will check for missing Python packages and system dependencies.
2. **Test all tools**:
```
python test_all_tools.py
```
This will test all tools and report any issues.
3. **Test image and audio processing specifically**:
```
python test_image_audio.py
```
This focuses on testing media processing tools and provides detailed troubleshooting steps.
## System Requirements
For full functionality, you'll need:
- **Python 3.8+**
- **Tesseract OCR** (for image text extraction)
- **FFmpeg** (for audio processing)
- **Internet connection** (for web search, YouTube, etc.)
- **API Keys**: GROQ_API_KEY must be set in .env file or environment variables
## Agent Structure
This agent uses a streamlined 3-node graph structure:
1. **PerceptionAgent**: Handles web searches, looking up information
2. **ActionAgent**: Performs calculations, file operations, code analysis
3. **EvaluationAgent**: Ensures answers are properly formatted
## Common Issues
If you encounter issues:
1. **Web Scraping Errors**: The agent has robust error handling for 403 Forbidden errors
2. **Audio Processing Errors**: Make sure FFmpeg is installed and in your PATH
3. **Image Processing Errors**: Make sure Tesseract OCR is installed and in your PATH
4. **GROQ API Rate Limits**: The agent includes automatic rate limiting and retry mechanisms
## Running GAIA Tests
To test if the agent can properly handle factual questions with GAIA format:
```
python test_factual_questions.py
```
## Testing Individual Tools
```python
from agent import multiply, add, subtract, divide, modulus # Math tools
from agent import web_search, wiki_search # Web tools
from agent import read_text_from_pdf, read_text_from_docx # Document tools
from agent import image_processing # Image tools
from agent import transcribe_audio # Audio tools
from agent import analyze_code, read_code_file # Code tools
# Test a tool directly
result = multiply(5, 7)
print(result) # 35
# Process an image
image_description = image_processing("Describe this image", "path/to/image.jpg")
print(image_description)
``` |