Spaces:
Running
🏗️ Priority 2A: Architecture Consolidation & Optimization Complete
Browse files✅ **PHASE 1: App Consolidation**
- Consolidated 8 app variants into single robust app.py with intelligent mode selection
- Created archive/app_variants/ to preserve all previous versions
- Enhanced ConsolidatedGAIAInterface with advanced capability detection
- Added graceful degradation for missing dependencies
- Unified interface supporting demo, individual, and comprehensive testing modes
✅ **PHASE 2: Architecture Decision - Hybrid Approach**
- Created main_hybrid.py combining best of legacy and refactored architectures
- Intelligent architecture selection: refactored → legacy fallback
- Unified interface regardless of underlying architecture
- Environment variable control (GAIA_ARCHITECTURE=auto/legacy/refactored)
- Production-proven legacy with modern modular benefits
✅ **PHASE 3: Production Optimization**
- Optimized requirements.txt with core vs optional dependencies
- Added comprehensive health_check.py for system monitoring
- Integrated health check into web interface with detailed reports
- Added dependency fallback strategies throughout codebase
- Enhanced error handling and graceful degradation
**🎯 CONSOLIDATION ACHIEVEMENTS:**
- ✅ **Single App Interface**: 1 robust app vs 8 variants (87.5% reduction)
- ✅ **Architecture Flexibility**: Hybrid system with intelligent selection
- ✅ **Optimized Dependencies**: Faster HF Space startup with optional deps
- ✅ **Production Monitoring**: Built-in health checks and system status
- ✅ **Maintainability**: Clean codebase with archived backups
**🔧 TECHNICAL IMPROVEMENTS:**
- Capability detection system for graceful feature availability
- Hybrid solver with unified interface for both architectures
- Health monitoring with dependency, API key, and component checks
- Optimized requirements with clear core vs optional separation
- Enhanced error handling throughout the application stack
**📊 EXPECTED BENEFITS:**
- **Faster Deployment**: Optimized dependencies for quicker HF Space builds
- **Better Stability**: Graceful handling of missing components
- **Easier Maintenance**: Single consolidated interface vs multiple variants
- **Enhanced Monitoring**: Real-time system health and capability tracking
- **Future-Ready**: Clean foundation for additional feature development
This establishes a production-optimized, maintainable foundation while preserving
all existing functionality and the 85% accuracy performance.
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
- app.py +371 -126
- archive/app_variants/app_backup.py +412 -0
- app_comprehensive.py → archive/app_variants/app_comprehensive.py +0 -0
- app_demo.py → archive/app_variants/app_demo.py +0 -0
- app_full.py → archive/app_variants/app_full.py +0 -0
- app_minimal.py → archive/app_variants/app_minimal.py +0 -0
- archive/app_variants/app_original.py +412 -0
- archive/app_variants/app_simple.py +120 -0
- app_test.py → archive/app_variants/app_test.py +0 -0
- health_check.py +243 -0
- main_hybrid.py +188 -0
- requirements.txt +23 -12
- requirements_original.txt +19 -0
@@ -1,7 +1,7 @@
|
|
1 |
#!/usr/bin/env python3
|
2 |
"""
|
3 |
-
Advanced GAIA Agent - Production
|
4 |
-
|
5 |
"""
|
6 |
|
7 |
import gradio as gr
|
@@ -9,48 +9,140 @@ import asyncio
|
|
9 |
import json
|
10 |
import os
|
11 |
import time
|
|
|
12 |
from datetime import datetime
|
|
|
13 |
|
14 |
-
#
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
15 |
try:
|
16 |
-
|
17 |
-
from
|
18 |
-
|
|
|
19 |
except ImportError:
|
20 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
21 |
|
22 |
-
|
23 |
-
|
|
|
|
|
|
|
|
|
24 |
|
25 |
def __init__(self):
|
26 |
self.solver = None
|
|
|
27 |
self.test_running = False
|
28 |
self.initialization_error = None
|
29 |
self.last_test_time = None
|
30 |
self.session_cleanup_threshold = 3600 # 1 hour
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
31 |
|
32 |
-
if
|
33 |
try:
|
34 |
self.solver = GAIASolver()
|
|
|
|
|
35 |
except Exception as e:
|
36 |
import traceback
|
37 |
self.initialization_error = f"Failed to initialize GAIASolver: {str(e)}\n{traceback.format_exc()}"
|
38 |
-
print(f"⚠️
|
39 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
40 |
|
|
|
|
|
41 |
def solve_question(self, question: str) -> str:
|
42 |
-
"""Solve question with
|
43 |
if not question.strip():
|
44 |
return "Please enter a question."
|
45 |
|
46 |
-
# Check if initialization failed but we're in
|
47 |
-
if
|
48 |
error_msg = f"""⚠️ **Agent Initialization Error**
|
49 |
|
50 |
The GAIA agent could not be initialized properly. Using demo mode instead.
|
51 |
|
52 |
-
If you're the developer, check the Hugging Face Space logs for details.
|
53 |
-
|
54 |
**Technical details:**
|
55 |
```
|
56 |
{self.initialization_error}
|
@@ -60,15 +152,16 @@ If you're the developer, check the Hugging Face Space logs for details.
|
|
60 |
|
61 |
### Demo Mode Response:
|
62 |
"""
|
63 |
-
demo_response = self.
|
64 |
return error_msg + demo_response
|
65 |
-
|
66 |
-
|
67 |
-
|
|
|
68 |
else:
|
69 |
-
return self.
|
70 |
|
71 |
-
def
|
72 |
"""Solve with the full GAIA agent."""
|
73 |
try:
|
74 |
# Create question object
|
@@ -78,13 +171,26 @@ If you're the developer, check the Hugging Face Space logs for details.
|
|
78 |
'Level': 1
|
79 |
}
|
80 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
81 |
# Solve with main solver
|
82 |
result = self.solver.solve_question(question_obj)
|
83 |
|
84 |
answer = result.get('answer', 'No answer generated')
|
85 |
explanation = result.get('explanation', '')
|
86 |
|
87 |
-
response = f"**Answer:** {answer}\n\n"
|
88 |
if explanation:
|
89 |
response += f"**Explanation:** {explanation}\n\n"
|
90 |
response += "---\n*Advanced GAIA Agent (85% benchmark accuracy)*"
|
@@ -92,63 +198,104 @@ If you're the developer, check the Hugging Face Space logs for details.
|
|
92 |
return response
|
93 |
|
94 |
except Exception as e:
|
95 |
-
return f"**Error
|
96 |
|
97 |
-
def
|
98 |
-
"""
|
99 |
question_lower = question.lower()
|
100 |
|
101 |
-
#
|
102 |
-
if any(
|
103 |
-
|
104 |
-
return "**102**\n\n---\n*Advanced GAIA Agent: Math calculation*"
|
105 |
-
else:
|
106 |
-
return "**4**\n\n---\n*Advanced GAIA Agent: Math calculation*"
|
107 |
-
|
108 |
-
elif "hello" in question_lower:
|
109 |
-
return "**Hello! I'm the Advanced GAIA Agent with 85% benchmark accuracy.**\n\nI can help with research, math, chess analysis, Excel processing, and multimedia questions.\n\n---\n*Ready to assist you*"
|
110 |
|
111 |
-
elif
|
112 |
-
return "**
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
113 |
|
114 |
-
elif any(
|
115 |
-
return "**
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
116 |
|
117 |
elif "chess" in question_lower:
|
118 |
-
return "**
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
119 |
|
120 |
-
elif "excel"
|
121 |
-
return "**
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
122 |
|
123 |
else:
|
124 |
-
return f"""**
|
125 |
|
126 |
-
|
127 |
|
128 |
-
|
129 |
-
|
130 |
-
|
131 |
-
|
132 |
-
|
133 |
|
134 |
-
**
|
135 |
-
- "100 + 2" - Math calculation
|
136 |
-
- "Who invented the telephone?" - Research question
|
137 |
-
- "Hello" - Get greeting
|
138 |
-
- "What is the capital of France?" - Geography question
|
139 |
|
140 |
-
|
141 |
-
*Advanced GAIA Agent Demo (85% GAIA benchmark accuracy)*"""
|
142 |
|
143 |
-
|
144 |
-
"""
|
145 |
-
|
146 |
-
|
147 |
-
|
148 |
-
|
149 |
-
|
150 |
-
|
151 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
152 |
|
153 |
try:
|
154 |
progress(0, desc="Starting comprehensive GAIA test...")
|
@@ -167,7 +314,7 @@ As an Advanced GAIA Agent with 85% benchmark accuracy, I'm designed to handle:
|
|
167 |
if result.get("status") == "error":
|
168 |
return f"❌ **Test Failed:** {result.get('message', 'Unknown error')}"
|
169 |
|
170 |
-
#
|
171 |
total = result.get('total_questions', 0)
|
172 |
duration = result.get('duration_seconds', 0)
|
173 |
accuracy = result.get('accuracy_percent', 0)
|
@@ -177,7 +324,7 @@ As an Advanced GAIA Agent with 85% benchmark accuracy, I'm designed to handle:
|
|
177 |
classification_counts = result.get('classification_counts', {})
|
178 |
|
179 |
# Check if advanced features were used
|
180 |
-
advanced_features_used = result.get('advanced_features_used',
|
181 |
honest_accuracy = result.get('honest_accuracy_measurement', False)
|
182 |
|
183 |
# Create detailed report
|
@@ -254,8 +401,8 @@ As an Advanced GAIA Agent with 85% benchmark accuracy, I'm designed to handle:
|
|
254 |
|
255 |
def run_comprehensive_test(self, question_limit: int, max_concurrent: int, progress=gr.Progress()):
|
256 |
"""Wrapper for comprehensive test."""
|
257 |
-
if not
|
258 |
-
return "❌ **Comprehensive testing unavailable
|
259 |
|
260 |
try:
|
261 |
import concurrent.futures
|
@@ -290,14 +437,18 @@ As an Advanced GAIA Agent with 85% benchmark accuracy, I'm designed to handle:
|
|
290 |
print(f"⚠️ Cleanup warning: {e}")
|
291 |
|
292 |
# Initialize interface
|
293 |
-
gaia_interface =
|
294 |
|
295 |
-
# Create the interface
|
296 |
with gr.Blocks(title="Advanced GAIA Agent - 85% Benchmark Accuracy", theme=gr.themes.Soft()) as demo:
|
297 |
-
|
|
|
|
|
298 |
|
299 |
gr.Markdown(f"""
|
300 |
-
# 🏆 Advanced GAIA Agent - 85% Benchmark Accuracy
|
|
|
|
|
301 |
|
302 |
**Production-Ready AI Agent for Complex Question Answering**
|
303 |
|
@@ -307,45 +458,57 @@ with gr.Blocks(title="Advanced GAIA Agent - 85% Benchmark Accuracy", theme=gr.th
|
|
307 |
- 🎯 85% overall accuracy
|
308 |
- 🧠 Multi-agent system with intelligent question routing
|
309 |
- 🛠️ 42 specialized tools for research, chess, Excel, multimedia
|
310 |
-
-
|
|
|
|
|
|
|
|
|
|
|
311 |
""")
|
312 |
|
313 |
with gr.Tabs():
|
314 |
-
# Individual Question
|
315 |
-
with gr.
|
316 |
gr.Markdown("""
|
317 |
-
### Ask
|
318 |
-
|
319 |
-
**Working Examples to Try:**
|
320 |
-
- "100 + 2" • "Who invented the telephone?" • "What is the capital of France?"
|
321 |
-
- "Hello" • "Chess analysis" • "Excel processing"
|
322 |
""")
|
323 |
-
|
324 |
with gr.Row():
|
325 |
-
|
326 |
-
|
327 |
-
|
328 |
-
|
329 |
-
|
330 |
-
|
|
|
|
|
|
|
|
|
331 |
|
332 |
-
|
333 |
-
label="
|
334 |
-
lines=
|
335 |
interactive=False
|
336 |
)
|
337 |
-
|
338 |
-
|
339 |
-
|
340 |
-
|
341 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
342 |
)
|
343 |
|
344 |
-
# Comprehensive Testing
|
345 |
-
if
|
346 |
-
with gr.
|
347 |
gr.Markdown("""
|
348 |
-
###
|
349 |
|
350 |
**Test the system against multiple GAIA questions simultaneously with:**
|
351 |
- Asynchronous processing for speed
|
@@ -381,32 +544,114 @@ with gr.Blocks(title="Advanced GAIA Agent - 85% Benchmark Accuracy", theme=gr.th
|
|
381 |
)
|
382 |
|
383 |
test_btn.click(
|
384 |
-
|
385 |
inputs=[question_limit, max_concurrent],
|
386 |
-
outputs=test_output
|
387 |
)
|
388 |
-
|
389 |
-
|
390 |
-
|
391 |
-
|
392 |
-
|
393 |
-
|
394 |
-
|
395 |
-
|
396 |
-
|
397 |
-
|
398 |
-
|
399 |
-
|
400 |
-
|
401 |
-
|
402 |
-
|
403 |
-
|
404 |
-
|
405 |
-
|
406 |
-
|
407 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
408 |
|
|
|
409 |
if __name__ == "__main__":
|
410 |
-
|
411 |
-
|
412 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
#!/usr/bin/env python3
|
2 |
"""
|
3 |
+
Consolidated Advanced GAIA Agent - Production Interface
|
4 |
+
Unified interface combining all features from multiple app variants with intelligent mode selection.
|
5 |
"""
|
6 |
|
7 |
import gradio as gr
|
|
|
9 |
import json
|
10 |
import os
|
11 |
import time
|
12 |
+
import sys
|
13 |
from datetime import datetime
|
14 |
+
from pathlib import Path
|
15 |
|
16 |
+
# === CAPABILITY DETECTION ===
|
17 |
+
# Detect available capabilities and set feature flags
|
18 |
+
CAPABILITIES = {
|
19 |
+
'full_solver': False,
|
20 |
+
'async_testing': False,
|
21 |
+
'classification': False,
|
22 |
+
'tools_available': False,
|
23 |
+
'advanced_testing': False
|
24 |
+
}
|
25 |
+
|
26 |
+
# Try to import components and detect capabilities
|
27 |
try:
|
28 |
+
# Try hybrid solver first (best of both architectures)
|
29 |
+
from main_hybrid import HybridGAIASolver as GAIASolver
|
30 |
+
CAPABILITIES['full_solver'] = True
|
31 |
+
print("✅ Hybrid GAIASolver available")
|
32 |
except ImportError:
|
33 |
+
try:
|
34 |
+
# Fall back to legacy solver
|
35 |
+
from main import GAIASolver
|
36 |
+
CAPABILITIES['full_solver'] = True
|
37 |
+
print("✅ Legacy GAIASolver available")
|
38 |
+
except ImportError as e:
|
39 |
+
print(f"⚠️ GAIASolver not available: {e}")
|
40 |
+
|
41 |
+
try:
|
42 |
+
from async_complete_test_hf import run_hf_comprehensive_test
|
43 |
+
CAPABILITIES['async_testing'] = True
|
44 |
+
print("✅ Async testing available")
|
45 |
+
except ImportError as e:
|
46 |
+
print(f"⚠️ Async testing not available: {e}")
|
47 |
+
|
48 |
+
try:
|
49 |
+
from question_classifier import QuestionClassifier
|
50 |
+
CAPABILITIES['classification'] = True
|
51 |
+
print("✅ Question classification available")
|
52 |
+
except ImportError as e:
|
53 |
+
print(f"⚠️ Question classification not available: {e}")
|
54 |
+
|
55 |
+
try:
|
56 |
+
from gaia_tools import GAIA_TOOLS
|
57 |
+
CAPABILITIES['tools_available'] = True
|
58 |
+
print(f"✅ {len(GAIA_TOOLS)} GAIA tools available")
|
59 |
+
except ImportError as e:
|
60 |
+
print(f"⚠️ GAIA tools not available: {e}")
|
61 |
+
|
62 |
+
try:
|
63 |
+
from async_complete_test import AsyncGAIATestSystem
|
64 |
+
CAPABILITIES['advanced_testing'] = True
|
65 |
+
print("✅ Advanced testing infrastructure available")
|
66 |
+
except ImportError as e:
|
67 |
+
print(f"⚠️ Advanced testing not available: {e}")
|
68 |
|
69 |
+
# Determine overall mode
|
70 |
+
FULL_MODE = CAPABILITIES['full_solver']
|
71 |
+
DEMO_MODE = not FULL_MODE
|
72 |
+
|
73 |
+
class ConsolidatedGAIAInterface:
|
74 |
+
"""Consolidated GAIA interface with intelligent mode selection and feature detection."""
|
75 |
|
76 |
def __init__(self):
|
77 |
self.solver = None
|
78 |
+
self.classifier = None
|
79 |
self.test_running = False
|
80 |
self.initialization_error = None
|
81 |
self.last_test_time = None
|
82 |
self.session_cleanup_threshold = 3600 # 1 hour
|
83 |
+
self.current_mode = "demo"
|
84 |
+
|
85 |
+
# Initialize components based on available capabilities
|
86 |
+
self._initialize_components()
|
87 |
+
|
88 |
+
def _initialize_components(self):
|
89 |
+
"""Initialize available components based on detected capabilities."""
|
90 |
|
91 |
+
if CAPABILITIES['full_solver']:
|
92 |
try:
|
93 |
self.solver = GAIASolver()
|
94 |
+
self.current_mode = "full"
|
95 |
+
print("✅ GAIASolver initialized successfully")
|
96 |
except Exception as e:
|
97 |
import traceback
|
98 |
self.initialization_error = f"Failed to initialize GAIASolver: {str(e)}\n{traceback.format_exc()}"
|
99 |
+
print(f"⚠️ GAIASolver initialization error: {self.initialization_error}")
|
100 |
+
self.current_mode = "demo"
|
101 |
+
|
102 |
+
if CAPABILITIES['classification']:
|
103 |
+
try:
|
104 |
+
self.classifier = QuestionClassifier()
|
105 |
+
print("✅ Question classifier initialized")
|
106 |
+
except Exception as e:
|
107 |
+
print(f"⚠️ Question classifier initialization error: {e}")
|
108 |
+
|
109 |
+
def get_mode_info(self) -> str:
|
110 |
+
"""Get current mode information."""
|
111 |
+
if self.current_mode == "full":
|
112 |
+
return "🚀 **Full Mode**: Complete GAIA Agent with 85% benchmark accuracy"
|
113 |
+
elif self.current_mode == "demo":
|
114 |
+
return "🎯 **Demo Mode**: Limited functionality - showcases capabilities"
|
115 |
+
else:
|
116 |
+
return f"🔧 **{self.current_mode.title()} Mode**: Partial functionality"
|
117 |
+
|
118 |
+
def get_capabilities_info(self) -> str:
|
119 |
+
"""Get detailed capabilities information."""
|
120 |
+
info = "## 🔧 Available Capabilities:\n"
|
121 |
+
|
122 |
+
for capability, available in CAPABILITIES.items():
|
123 |
+
status = "✅" if available else "���"
|
124 |
+
info += f"- {status} **{capability.replace('_', ' ').title()}**\n"
|
125 |
+
|
126 |
+
if CAPABILITIES['tools_available']:
|
127 |
+
try:
|
128 |
+
from gaia_tools import GAIA_TOOLS
|
129 |
+
info += f"\n**Tools Available**: {len(GAIA_TOOLS)} specialized tools\n"
|
130 |
+
except:
|
131 |
+
pass
|
132 |
|
133 |
+
return info
|
134 |
+
|
135 |
def solve_question(self, question: str) -> str:
|
136 |
+
"""Solve question with best available method."""
|
137 |
if not question.strip():
|
138 |
return "Please enter a question."
|
139 |
|
140 |
+
# Check if initialization failed but we're in full mode attempt
|
141 |
+
if CAPABILITIES['full_solver'] and self.initialization_error:
|
142 |
error_msg = f"""⚠️ **Agent Initialization Error**
|
143 |
|
144 |
The GAIA agent could not be initialized properly. Using demo mode instead.
|
145 |
|
|
|
|
|
146 |
**Technical details:**
|
147 |
```
|
148 |
{self.initialization_error}
|
|
|
152 |
|
153 |
### Demo Mode Response:
|
154 |
"""
|
155 |
+
demo_response = self._solve_with_demo_agent(question)
|
156 |
return error_msg + demo_response
|
157 |
+
|
158 |
+
# Route to best available solver
|
159 |
+
if self.current_mode == "full" and self.solver:
|
160 |
+
return self._solve_with_full_agent(question)
|
161 |
else:
|
162 |
+
return self._solve_with_demo_agent(question)
|
163 |
|
164 |
+
def _solve_with_full_agent(self, question: str) -> str:
|
165 |
"""Solve with the full GAIA agent."""
|
166 |
try:
|
167 |
# Create question object
|
|
|
171 |
'Level': 1
|
172 |
}
|
173 |
|
174 |
+
# Add classification if available
|
175 |
+
if self.classifier:
|
176 |
+
try:
|
177 |
+
classification = self.classifier.classify_question(question)
|
178 |
+
question_type = classification.get('primary_agent', 'general')
|
179 |
+
confidence = classification.get('confidence', 0)
|
180 |
+
|
181 |
+
classification_info = f"**Question Type**: {question_type} (confidence: {confidence:.1%})\n\n"
|
182 |
+
except Exception as e:
|
183 |
+
classification_info = f"**Classification**: Error ({str(e)})\n\n"
|
184 |
+
else:
|
185 |
+
classification_info = "**Classification**: Not available\n\n"
|
186 |
+
|
187 |
# Solve with main solver
|
188 |
result = self.solver.solve_question(question_obj)
|
189 |
|
190 |
answer = result.get('answer', 'No answer generated')
|
191 |
explanation = result.get('explanation', '')
|
192 |
|
193 |
+
response = f"{classification_info}**Answer:** {answer}\n\n"
|
194 |
if explanation:
|
195 |
response += f"**Explanation:** {explanation}\n\n"
|
196 |
response += "---\n*Advanced GAIA Agent (85% benchmark accuracy)*"
|
|
|
198 |
return response
|
199 |
|
200 |
except Exception as e:
|
201 |
+
return f"❌ **Error**: {str(e)}\n\nFalling back to demo mode...\n\n" + self._solve_with_demo_agent(question)
|
202 |
|
203 |
+
def _solve_with_demo_agent(self, question: str) -> str:
|
204 |
+
"""Enhanced demo agent with intelligent responses."""
|
205 |
question_lower = question.lower()
|
206 |
|
207 |
+
# Enhanced demo responses
|
208 |
+
if any(phrase in question_lower for phrase in ["2 + 2", "2+2"]):
|
209 |
+
return "**4**\n\n*This is a demo response. The full agent can solve complex GAIA benchmark questions with 85% accuracy.*"
|
|
|
|
|
|
|
|
|
|
|
|
|
210 |
|
211 |
+
elif "hello" in question_lower or "hi" in question_lower:
|
212 |
+
return """**Hello!** 👋
|
213 |
+
|
214 |
+
I'm the Advanced GAIA Agent with **85% benchmark accuracy**.
|
215 |
+
|
216 |
+
In demo mode, I provide simple responses. The full agent can:
|
217 |
+
- 🧠 Solve complex multi-step reasoning problems
|
218 |
+
- 🎥 Analyze videos and multimedia content
|
219 |
+
- 📊 Process Excel files and perform calculations
|
220 |
+
- ♟️ Analyze chess positions with perfect accuracy
|
221 |
+
- 🔍 Conduct comprehensive research with 42 specialized tools
|
222 |
+
|
223 |
+
*Enable full mode by providing the required API keys (GEMINI_API_KEY, HUGGINGFACE_TOKEN).*"""
|
224 |
|
225 |
+
elif any(phrase in question_lower for phrase in ["what", "how", "why", "who", "when", "where"]):
|
226 |
+
return f"""**Demo Response for**: "{question[:100]}{'...' if len(question) > 100 else ''}"
|
227 |
+
|
228 |
+
This appears to be a **{self._classify_demo_question(question)}** question.
|
229 |
+
|
230 |
+
In full mode, I would:
|
231 |
+
1. 🎯 Classify the question using advanced LLM-based routing
|
232 |
+
2. 🛠️ Select appropriate tools from 42 specialized capabilities
|
233 |
+
3. 🔍 Execute multi-step reasoning with error handling
|
234 |
+
4. ✅ Provide validated answers with 85% accuracy
|
235 |
+
|
236 |
+
*This is a demo response. Enable full mode for complete functionality.*"""
|
237 |
|
238 |
elif "chess" in question_lower:
|
239 |
+
return """**Chess Analysis Demo**
|
240 |
+
|
241 |
+
In full mode, I achieve **100% accuracy** on chess questions using:
|
242 |
+
- 🎯 Universal FEN correction system
|
243 |
+
- ♟️ Multi-tool consensus with Stockfish analysis
|
244 |
+
- 🏆 Perfect algebraic notation extraction
|
245 |
+
|
246 |
+
*Example: For GAIA chess questions, I correctly identify moves like "Rd5" with perfect accuracy.*
|
247 |
+
|
248 |
+
*This is a demo response. Enable full mode for actual chess analysis.*"""
|
249 |
|
250 |
+
elif any(phrase in question_lower for phrase in ["excel", "spreadsheet", "csv"]):
|
251 |
+
return """**Excel Processing Demo**
|
252 |
+
|
253 |
+
In full mode, I achieve **100% accuracy** on Excel questions using:
|
254 |
+
- 📊 Complete .xlsx/.xls file analysis
|
255 |
+
- 💰 Currency formatting ($89,706.00)
|
256 |
+
- 🔢 Advanced calculations with filtering
|
257 |
+
- 📈 Multi-sheet processing
|
258 |
+
|
259 |
+
*Example: I can analyze fast-food sales data, exclude drinks, and calculate exact totals.*
|
260 |
+
|
261 |
+
*This is a demo response. Enable full mode for actual Excel processing.*"""
|
262 |
|
263 |
else:
|
264 |
+
return f"""**Demo Response**
|
265 |
|
266 |
+
I received: "{question[:100]}{'...' if len(question) > 100 else ''}"
|
267 |
|
268 |
+
**In full mode, I would:**
|
269 |
+
- Analyze this as a **{self._classify_demo_question(question)}** question
|
270 |
+
- Use appropriate specialized tools
|
271 |
+
- Provide detailed reasoning and validation
|
272 |
+
- Achieve 85% benchmark accuracy
|
273 |
|
274 |
+
**Current Capabilities**: {self.get_capabilities_info()}
|
|
|
|
|
|
|
|
|
275 |
|
276 |
+
*This is a demo response. The full agent requires API keys for complete functionality.*"""
|
|
|
277 |
|
278 |
+
def _classify_demo_question(self, question: str) -> str:
|
279 |
+
"""Simple demo classification."""
|
280 |
+
question_lower = question.lower()
|
281 |
+
|
282 |
+
if any(word in question_lower for word in ["video", "youtube", "image", "picture"]):
|
283 |
+
return "multimedia"
|
284 |
+
elif any(word in question_lower for word in ["search", "find", "wikipedia", "research"]):
|
285 |
+
return "research"
|
286 |
+
elif any(word in question_lower for word in ["calculate", "math", "number", "count"]):
|
287 |
+
return "logic/math"
|
288 |
+
elif any(word in question_lower for word in ["file", "excel", "csv", "python"]):
|
289 |
+
return "file processing"
|
290 |
+
elif any(word in question_lower for word in ["chess", "move", "position"]):
|
291 |
+
return "chess analysis"
|
292 |
+
else:
|
293 |
+
return "general reasoning"
|
294 |
+
|
295 |
+
async def run_comprehensive_test_async(self, question_limit: int, max_concurrent: int, progress):
|
296 |
+
"""Run comprehensive test with progress tracking."""
|
297 |
+
if not CAPABILITIES['async_testing']:
|
298 |
+
return "❌ **Comprehensive testing unavailable.** Async testing infrastructure not available."
|
299 |
|
300 |
try:
|
301 |
progress(0, desc="Starting comprehensive GAIA test...")
|
|
|
314 |
if result.get("status") == "error":
|
315 |
return f"❌ **Test Failed:** {result.get('message', 'Unknown error')}"
|
316 |
|
317 |
+
# Enhanced result formatting with capabilities info
|
318 |
total = result.get('total_questions', 0)
|
319 |
duration = result.get('duration_seconds', 0)
|
320 |
accuracy = result.get('accuracy_percent', 0)
|
|
|
324 |
classification_counts = result.get('classification_counts', {})
|
325 |
|
326 |
# Check if advanced features were used
|
327 |
+
advanced_features_used = result.get('advanced_features_used', CAPABILITIES['advanced_testing'])
|
328 |
honest_accuracy = result.get('honest_accuracy_measurement', False)
|
329 |
|
330 |
# Create detailed report
|
|
|
401 |
|
402 |
def run_comprehensive_test(self, question_limit: int, max_concurrent: int, progress=gr.Progress()):
|
403 |
"""Wrapper for comprehensive test."""
|
404 |
+
if not CAPABILITIES['async_testing']:
|
405 |
+
return "❌ **Comprehensive testing unavailable.** Please check that async_complete_test_hf is available."
|
406 |
|
407 |
try:
|
408 |
import concurrent.futures
|
|
|
437 |
print(f"⚠️ Cleanup warning: {e}")
|
438 |
|
439 |
# Initialize interface
|
440 |
+
gaia_interface = ConsolidatedGAIAInterface()
|
441 |
|
442 |
+
# Create the consolidated interface
|
443 |
with gr.Blocks(title="Advanced GAIA Agent - 85% Benchmark Accuracy", theme=gr.themes.Soft()) as demo:
|
444 |
+
|
445 |
+
# Dynamic title based on detected capabilities
|
446 |
+
mode_indicator = gaia_interface.get_mode_info()
|
447 |
|
448 |
gr.Markdown(f"""
|
449 |
+
# 🏆 Advanced GAIA Agent - 85% Benchmark Accuracy
|
450 |
+
|
451 |
+
{mode_indicator}
|
452 |
|
453 |
**Production-Ready AI Agent for Complex Question Answering**
|
454 |
|
|
|
458 |
- 🎯 85% overall accuracy
|
459 |
- 🧠 Multi-agent system with intelligent question routing
|
460 |
- 🛠️ 42 specialized tools for research, chess, Excel, multimedia
|
461 |
+
- ♟️ **Perfect accuracy** on chess questions (100%)
|
462 |
+
- 📊 **Perfect accuracy** on Excel processing (100%)
|
463 |
+
- 📚 **Enhanced** Wikipedia research with anti-hallucination
|
464 |
+
- 🎥 **Advanced** multimedia analysis with Gemini 2.0 Flash
|
465 |
+
|
466 |
+
{gaia_interface.get_capabilities_info()}
|
467 |
""")
|
468 |
|
469 |
with gr.Tabs():
|
470 |
+
# Tab 1: Individual Question Solving
|
471 |
+
with gr.TabItem("🧠 Individual Questions"):
|
472 |
gr.Markdown("""
|
473 |
+
### Ask Individual Questions
|
474 |
+
Test the GAIA agent with any question. The agent will automatically classify and route to appropriate specialists.
|
|
|
|
|
|
|
475 |
""")
|
476 |
+
|
477 |
with gr.Row():
|
478 |
+
with gr.Column(scale=3):
|
479 |
+
question_input = gr.Textbox(
|
480 |
+
label="Your Question:",
|
481 |
+
placeholder="Ask any complex question (e.g., chess analysis, Excel calculations, research questions)...",
|
482 |
+
lines=3
|
483 |
+
)
|
484 |
+
|
485 |
+
with gr.Column(scale=1):
|
486 |
+
solve_btn = gr.Button("🚀 Solve Question", variant="primary")
|
487 |
+
clear_btn = gr.Button("🗑️ Clear", variant="secondary")
|
488 |
|
489 |
+
answer_output = gr.Textbox(
|
490 |
+
label="📋 Answer:",
|
491 |
+
lines=15,
|
492 |
interactive=False
|
493 |
)
|
494 |
+
|
495 |
+
# Event handlers
|
496 |
+
solve_btn.click(
|
497 |
+
gaia_interface.solve_question,
|
498 |
+
inputs=[question_input],
|
499 |
+
outputs=[answer_output]
|
500 |
+
)
|
501 |
+
|
502 |
+
clear_btn.click(
|
503 |
+
lambda: ("", ""),
|
504 |
+
outputs=[question_input, answer_output]
|
505 |
)
|
506 |
|
507 |
+
# Tab 2: Comprehensive Testing (only if available)
|
508 |
+
if CAPABILITIES['async_testing']:
|
509 |
+
with gr.TabItem("📊 Comprehensive Testing"):
|
510 |
gr.Markdown("""
|
511 |
+
### Comprehensive GAIA Benchmark Testing
|
512 |
|
513 |
**Test the system against multiple GAIA questions simultaneously with:**
|
514 |
- Asynchronous processing for speed
|
|
|
544 |
)
|
545 |
|
546 |
test_btn.click(
|
547 |
+
gaia_interface.run_comprehensive_test,
|
548 |
inputs=[question_limit, max_concurrent],
|
549 |
+
outputs=[test_output]
|
550 |
)
|
551 |
+
|
552 |
+
# Tab 3: System Information & Health Check
|
553 |
+
with gr.TabItem("ℹ️ System Info"):
|
554 |
+
gr.Markdown(f"""
|
555 |
+
### System Configuration
|
556 |
+
|
557 |
+
**Current Mode**: {gaia_interface.current_mode.title()}
|
558 |
+
|
559 |
+
**Detected Capabilities**:
|
560 |
+
{gaia_interface.get_capabilities_info()}
|
561 |
+
|
562 |
+
### Usage Examples:
|
563 |
+
|
564 |
+
**Research Questions:**
|
565 |
+
- "Who nominated the only Featured Article about a dinosaur promoted in November 2016?"
|
566 |
+
- "What are the ingredients in the audio file?"
|
567 |
+
|
568 |
+
**Chess Analysis:**
|
569 |
+
- "What is the best move for Black in this chess position?" (with chess image)
|
570 |
+
|
571 |
+
**Excel Processing:**
|
572 |
+
- "What is the total of all food sales excluding drinks?" (with Excel file)
|
573 |
+
|
574 |
+
**Multimedia Analysis:**
|
575 |
+
- "How many different bird species can be seen simultaneously in this video?"
|
576 |
+
- "What does Teal'c say in response to the question in this video?"
|
577 |
+
|
578 |
+
### API Keys Required for Full Mode:
|
579 |
+
- `GEMINI_API_KEY` - For image/video analysis and reasoning
|
580 |
+
- `HUGGINGFACE_TOKEN` - For question classification
|
581 |
+
- `KLUSTER_API_KEY` - Optional, for premium model access
|
582 |
+
|
583 |
+
---
|
584 |
+
*Advanced GAIA Agent - Consolidated Interface v2.0*
|
585 |
+
""")
|
586 |
+
|
587 |
+
# Health Check Section
|
588 |
+
gr.Markdown("### 🏥 System Health Check")
|
589 |
+
health_check_btn = gr.Button("🔍 Run Health Check", variant="secondary")
|
590 |
+
health_output = gr.Textbox(
|
591 |
+
label="Health Check Results:",
|
592 |
+
lines=15,
|
593 |
+
interactive=False,
|
594 |
+
placeholder="Click 'Run Health Check' to see system status..."
|
595 |
+
)
|
596 |
+
|
597 |
+
def run_health_check():
|
598 |
+
"""Run system health check."""
|
599 |
+
try:
|
600 |
+
from health_check import GAIAHealthCheck
|
601 |
+
health = GAIAHealthCheck()
|
602 |
+
results = health.run_comprehensive_check()
|
603 |
+
|
604 |
+
# Format results for display
|
605 |
+
output = f"""# 🏥 System Health Report
|
606 |
+
|
607 |
+
## Overall Status: {results['status']}
|
608 |
+
**Health Score**: {results['health_score']}/100
|
609 |
+
|
610 |
+
## 📦 Dependencies
|
611 |
+
"""
|
612 |
+
for dep, status in results['dependencies'].items():
|
613 |
+
icon = "✅" if status else "❌"
|
614 |
+
output += f"- {icon} **{dep}**\n"
|
615 |
+
|
616 |
+
output += "\n## 🔑 API Keys\n"
|
617 |
+
for key, status in results['api_keys'].items():
|
618 |
+
icon = "✅" if status else "❌"
|
619 |
+
output += f"- {icon} **{key}**\n"
|
620 |
+
|
621 |
+
output += "\n## 🧩 Core Components\n"
|
622 |
+
for comp, status in results['components'].items():
|
623 |
+
icon = "✅" if status else "❌"
|
624 |
+
output += f"- {icon} **{comp}**\n"
|
625 |
+
|
626 |
+
output += "\n## 📊 System Metrics\n"
|
627 |
+
for metric, value in results['metrics'].items():
|
628 |
+
output += f"- **{metric}**: {value}\n"
|
629 |
+
|
630 |
+
output += f"\n---\n*Health check completed at {results['timestamp']}*"
|
631 |
+
return output
|
632 |
+
|
633 |
+
except Exception as e:
|
634 |
+
return f"❌ **Health Check Error**: {str(e)}"
|
635 |
+
|
636 |
+
health_check_btn.click(
|
637 |
+
run_health_check,
|
638 |
+
outputs=[health_output]
|
639 |
+
)
|
640 |
|
641 |
+
# Launch configuration
|
642 |
if __name__ == "__main__":
|
643 |
+
# Determine launch settings based on environment
|
644 |
+
if os.getenv("GRADIO_SERVER_NAME"):
|
645 |
+
# Production environment (HF Spaces)
|
646 |
+
demo.launch(
|
647 |
+
server_name="0.0.0.0",
|
648 |
+
server_port=int(os.getenv("GRADIO_SERVER_PORT", 7860)),
|
649 |
+
show_error=True
|
650 |
+
)
|
651 |
+
else:
|
652 |
+
# Development environment
|
653 |
+
demo.launch(
|
654 |
+
share=False,
|
655 |
+
debug=True,
|
656 |
+
show_error=True
|
657 |
+
)
|
@@ -0,0 +1,412 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
#!/usr/bin/env python3
|
2 |
+
"""
|
3 |
+
Advanced GAIA Agent - Production Demo with Comprehensive Testing
|
4 |
+
Complete interface supporting both individual questions and batch testing.
|
5 |
+
"""
|
6 |
+
|
7 |
+
import gradio as gr
|
8 |
+
import asyncio
|
9 |
+
import json
|
10 |
+
import os
|
11 |
+
import time
|
12 |
+
from datetime import datetime
|
13 |
+
|
14 |
+
# Try to import full solver, fallback to demo mode
|
15 |
+
try:
|
16 |
+
from main import GAIASolver
|
17 |
+
from async_complete_test_hf import run_hf_comprehensive_test
|
18 |
+
FULL_MODE = True
|
19 |
+
except ImportError:
|
20 |
+
FULL_MODE = False
|
21 |
+
|
22 |
+
class AdvancedGAIAInterface:
|
23 |
+
"""Advanced GAIA interface with demo and full modes."""
|
24 |
+
|
25 |
+
def __init__(self):
|
26 |
+
self.solver = None
|
27 |
+
self.test_running = False
|
28 |
+
self.initialization_error = None
|
29 |
+
self.last_test_time = None
|
30 |
+
self.session_cleanup_threshold = 3600 # 1 hour
|
31 |
+
|
32 |
+
if FULL_MODE:
|
33 |
+
try:
|
34 |
+
self.solver = GAIASolver()
|
35 |
+
except Exception as e:
|
36 |
+
import traceback
|
37 |
+
self.initialization_error = f"Failed to initialize GAIASolver: {str(e)}\n{traceback.format_exc()}"
|
38 |
+
print(f"⚠️ Initialization error: {self.initialization_error}")
|
39 |
+
# Still set FULL_MODE but we'll handle the error in solve_question
|
40 |
+
|
41 |
+
def solve_question(self, question: str) -> str:
|
42 |
+
"""Solve question with full solver or demo mode."""
|
43 |
+
if not question.strip():
|
44 |
+
return "Please enter a question."
|
45 |
+
|
46 |
+
# Check if initialization failed but we're in FULL_MODE
|
47 |
+
if FULL_MODE and self.initialization_error:
|
48 |
+
error_msg = f"""⚠️ **Agent Initialization Error**
|
49 |
+
|
50 |
+
The GAIA agent could not be initialized properly. Using demo mode instead.
|
51 |
+
|
52 |
+
If you're the developer, check the Hugging Face Space logs for details.
|
53 |
+
|
54 |
+
**Technical details:**
|
55 |
+
```
|
56 |
+
{self.initialization_error}
|
57 |
+
```
|
58 |
+
|
59 |
+
---
|
60 |
+
|
61 |
+
### Demo Mode Response:
|
62 |
+
"""
|
63 |
+
demo_response = self.solve_with_demo_agent(question)
|
64 |
+
return error_msg + demo_response
|
65 |
+
|
66 |
+
if FULL_MODE and self.solver:
|
67 |
+
return self.solve_with_full_agent(question)
|
68 |
+
else:
|
69 |
+
return self.solve_with_demo_agent(question)
|
70 |
+
|
71 |
+
def solve_with_full_agent(self, question: str) -> str:
|
72 |
+
"""Solve with the full GAIA agent."""
|
73 |
+
try:
|
74 |
+
# Create question object
|
75 |
+
question_obj = {
|
76 |
+
'task_id': f'manual_{int(time.time())}',
|
77 |
+
'Question': question,
|
78 |
+
'Level': 1
|
79 |
+
}
|
80 |
+
|
81 |
+
# Solve with main solver
|
82 |
+
result = self.solver.solve_question(question_obj)
|
83 |
+
|
84 |
+
answer = result.get('answer', 'No answer generated')
|
85 |
+
explanation = result.get('explanation', '')
|
86 |
+
|
87 |
+
response = f"**Answer:** {answer}\n\n"
|
88 |
+
if explanation:
|
89 |
+
response += f"**Explanation:** {explanation}\n\n"
|
90 |
+
response += "---\n*Advanced GAIA Agent (85% benchmark accuracy)*"
|
91 |
+
|
92 |
+
return response
|
93 |
+
|
94 |
+
except Exception as e:
|
95 |
+
return f"**Error:** {str(e)}\n\n---\n*Advanced GAIA Agent encountered an error*"
|
96 |
+
|
97 |
+
def solve_with_demo_agent(self, question: str) -> str:
|
98 |
+
"""Demo agent for when full solver isn't available."""
|
99 |
+
question_lower = question.lower()
|
100 |
+
|
101 |
+
# Handle common questions
|
102 |
+
if any(word in question_lower for word in ["2+2", "2 + 2", "100+2", "100 + 2"]):
|
103 |
+
if "100" in question_lower:
|
104 |
+
return "**102**\n\n---\n*Advanced GAIA Agent: Math calculation*"
|
105 |
+
else:
|
106 |
+
return "**4**\n\n---\n*Advanced GAIA Agent: Math calculation*"
|
107 |
+
|
108 |
+
elif "hello" in question_lower:
|
109 |
+
return "**Hello! I'm the Advanced GAIA Agent with 85% benchmark accuracy.**\n\nI can help with research, math, chess analysis, Excel processing, and multimedia questions.\n\n---\n*Ready to assist you*"
|
110 |
+
|
111 |
+
elif any(word in question_lower for word in ["who invented", "telephone"]):
|
112 |
+
return "**Alexander Graham Bell is credited with inventing the telephone.** He was a scientist and engineer who patented the first practical telephone in 1876 and co-founded AT&T.\n\n---\n*Research powered by Advanced GAIA Agent*"
|
113 |
+
|
114 |
+
elif any(word in question_lower for word in ["what is", "capital"]) and "france" in question_lower:
|
115 |
+
return "**Paris** is the capital of France.\n\n---\n*Research powered by Advanced GAIA Agent*"
|
116 |
+
|
117 |
+
elif "chess" in question_lower:
|
118 |
+
return "**For chess analysis, I use multi-tool consensus with universal FEN correction.** I can analyze positions, find best moves, and achieve 100% accuracy on GAIA chess benchmarks.\n\n---\n*Chess analysis by Advanced GAIA Agent*"
|
119 |
+
|
120 |
+
elif "excel" in question_lower:
|
121 |
+
return "**I can process Excel files with specialized tools.** I analyze spreadsheets, perform calculations, and format financial data. Example: I calculated $89,706.00 for fast-food chain sales analysis.\n\n---\n*File processing by Advanced GAIA Agent*"
|
122 |
+
|
123 |
+
else:
|
124 |
+
return f"""**I received your question: "{question[:100]}{'...' if len(question) > 100 else ''}"**
|
125 |
+
|
126 |
+
As an Advanced GAIA Agent with 85% benchmark accuracy, I'm designed to handle:
|
127 |
+
|
128 |
+
🔍 **Research**: Wikipedia, web search, factual lookups
|
129 |
+
♟️ **Chess**: Position analysis with perfect accuracy
|
130 |
+
📊 **Excel**: Spreadsheet processing and calculations
|
131 |
+
🎥 **Multimedia**: Video/audio analysis and transcription
|
132 |
+
🧮 **Math**: Complex calculations and logical reasoning
|
133 |
+
|
134 |
+
**Try these working examples:**
|
135 |
+
- "100 + 2" - Math calculation
|
136 |
+
- "Who invented the telephone?" - Research question
|
137 |
+
- "Hello" - Get greeting
|
138 |
+
- "What is the capital of France?" - Geography question
|
139 |
+
|
140 |
+
---
|
141 |
+
*Advanced GAIA Agent Demo (85% GAIA benchmark accuracy)*"""
|
142 |
+
|
143 |
+
async def run_comprehensive_test_async(self, question_limit: int, max_concurrent: int, progress=gr.Progress()):
|
144 |
+
"""Run comprehensive test if available."""
|
145 |
+
if not FULL_MODE:
|
146 |
+
return "❌ **Comprehensive testing requires full solver mode.** Currently running in demo mode."
|
147 |
+
|
148 |
+
if self.test_running:
|
149 |
+
return "❌ Test already running! Please wait for completion."
|
150 |
+
|
151 |
+
self.test_running = True
|
152 |
+
|
153 |
+
try:
|
154 |
+
progress(0, desc="Starting comprehensive GAIA test...")
|
155 |
+
|
156 |
+
# Progress callback for the test system
|
157 |
+
def update_progress(prog, message):
|
158 |
+
progress(prog, desc=message)
|
159 |
+
|
160 |
+
# Run the comprehensive test
|
161 |
+
result = await run_hf_comprehensive_test(
|
162 |
+
question_limit=question_limit,
|
163 |
+
max_concurrent=max_concurrent,
|
164 |
+
progress_callback=update_progress
|
165 |
+
)
|
166 |
+
|
167 |
+
if result.get("status") == "error":
|
168 |
+
return f"❌ **Test Failed:** {result.get('message', 'Unknown error')}"
|
169 |
+
|
170 |
+
# Format results (same as before)
|
171 |
+
total = result.get('total_questions', 0)
|
172 |
+
duration = result.get('duration_seconds', 0)
|
173 |
+
accuracy = result.get('accuracy_percent', 0)
|
174 |
+
|
175 |
+
status_counts = result.get('status_counts', {})
|
176 |
+
validation_counts = result.get('validation_counts', {})
|
177 |
+
classification_counts = result.get('classification_counts', {})
|
178 |
+
|
179 |
+
# Check if advanced features were used
|
180 |
+
advanced_features_used = result.get('advanced_features_used', False)
|
181 |
+
honest_accuracy = result.get('honest_accuracy_measurement', False)
|
182 |
+
|
183 |
+
# Create detailed report
|
184 |
+
report = f"""# 🏆 Comprehensive GAIA Test Results
|
185 |
+
|
186 |
+
## 🚀 Testing System
|
187 |
+
- **Mode:** {'Advanced Testing Infrastructure' if advanced_features_used else 'Basic Testing Mode'}
|
188 |
+
- **Accuracy Measurement:** {'Honest (no overrides)' if honest_accuracy else 'Standard'}
|
189 |
+
- **Classification Analysis:** {'Enabled' if result.get('classification_analysis') else 'Basic'}
|
190 |
+
|
191 |
+
## 📊 Overall Performance
|
192 |
+
- **Total Questions:** {total}
|
193 |
+
- **Duration:** {duration:.1f} seconds ({duration/60:.1f} minutes)
|
194 |
+
- **Accuracy:** {accuracy}% ({validation_counts.get('correct', 0)}/{validation_counts.get('correct', 0) + validation_counts.get('incorrect', 0)} correct)
|
195 |
+
- **Questions/Minute:** {result.get('questions_per_minute', 0):.1f}
|
196 |
+
|
197 |
+
## 📈 Status Breakdown
|
198 |
+
"""
|
199 |
+
for status, count in status_counts.items():
|
200 |
+
percentage = (count / total * 100) if total > 0 else 0
|
201 |
+
report += f"- **{status.title()}:** {count} ({percentage:.1f}%)\n"
|
202 |
+
|
203 |
+
report += "\n## 🎯 Validation Results\n"
|
204 |
+
for validation, count in validation_counts.items():
|
205 |
+
percentage = (count / total * 100) if total > 0 else 0
|
206 |
+
report += f"- **{validation.title()}:** {count} ({percentage:.1f}%)\n"
|
207 |
+
|
208 |
+
report += "\n## 🤖 Question Types & Performance\n"
|
209 |
+
classification_performance = result.get('classification_performance', {})
|
210 |
+
for agent_type, count in classification_counts.items():
|
211 |
+
percentage = (count / total * 100) if total > 0 else 0
|
212 |
+
# Show performance per classification if available
|
213 |
+
if classification_performance and agent_type in classification_performance:
|
214 |
+
perf = classification_performance[agent_type]
|
215 |
+
accuracy_pct = perf.get('accuracy', 0) * 100
|
216 |
+
report += f"- **{agent_type}:** {count} questions ({percentage:.1f}%) - {accuracy_pct:.1f}% accuracy\n"
|
217 |
+
else:
|
218 |
+
report += f"- **{agent_type}:** {count} ({percentage:.1f}%)\n"
|
219 |
+
|
220 |
+
# Add tool effectiveness analysis if available
|
221 |
+
tool_effectiveness = result.get('tool_effectiveness', {})
|
222 |
+
if tool_effectiveness:
|
223 |
+
report += "\n## 🔧 Top Performing Tools\n"
|
224 |
+
# Sort tools by success rate
|
225 |
+
sorted_tools = sorted(tool_effectiveness.items(),
|
226 |
+
key=lambda x: x[1].get('success_rate', 0),
|
227 |
+
reverse=True)[:5]
|
228 |
+
for tool_name, stats in sorted_tools:
|
229 |
+
success_rate = stats.get('success_rate', 0) * 100
|
230 |
+
usage_count = stats.get('usage_count', 0)
|
231 |
+
report += f"- **{tool_name}:** {success_rate:.1f}% success ({usage_count} uses)\n"
|
232 |
+
|
233 |
+
report += f"\n## 💾 Session Data\n- **Session ID:** {result.get('session_id', 'unknown')}\n- **Timestamp:** {result.get('timestamp', 'unknown')}\n"
|
234 |
+
|
235 |
+
# Add improvement recommendations if available
|
236 |
+
recommendations = result.get('improvement_recommendations', [])
|
237 |
+
if recommendations:
|
238 |
+
report += "\n## 💡 Improvement Recommendations\n"
|
239 |
+
for rec in recommendations[:3]: # Show top 3 recommendations
|
240 |
+
report += f"- {rec}\n"
|
241 |
+
|
242 |
+
report += "\n---\n*Advanced GAIA Agent - Comprehensive Testing Complete*"
|
243 |
+
|
244 |
+
return report
|
245 |
+
|
246 |
+
except Exception as e:
|
247 |
+
return f"❌ **Test Error:** {str(e)}"
|
248 |
+
|
249 |
+
finally:
|
250 |
+
self.test_running = False
|
251 |
+
self.last_test_time = time.time()
|
252 |
+
# Trigger cleanup after testing
|
253 |
+
self._cleanup_session()
|
254 |
+
|
255 |
+
def run_comprehensive_test(self, question_limit: int, max_concurrent: int, progress=gr.Progress()):
|
256 |
+
"""Wrapper for comprehensive test."""
|
257 |
+
if not FULL_MODE:
|
258 |
+
return "❌ **Comprehensive testing unavailable in demo mode.** The demo showcases individual question capabilities."
|
259 |
+
|
260 |
+
try:
|
261 |
+
import concurrent.futures
|
262 |
+
with concurrent.futures.ThreadPoolExecutor() as executor:
|
263 |
+
future = executor.submit(
|
264 |
+
asyncio.run,
|
265 |
+
self.run_comprehensive_test_async(question_limit, max_concurrent, progress)
|
266 |
+
)
|
267 |
+
return future.result(timeout=1800) # 30 minute timeout
|
268 |
+
|
269 |
+
except Exception as e:
|
270 |
+
return f"❌ **Execution Error:** {str(e)}"
|
271 |
+
|
272 |
+
def _cleanup_session(self):
|
273 |
+
"""Clean up session resources for memory management."""
|
274 |
+
import gc
|
275 |
+
import tempfile
|
276 |
+
import shutil
|
277 |
+
|
278 |
+
try:
|
279 |
+
# Clean up temporary files
|
280 |
+
temp_dirs = ['/tmp/async_test_results', '/tmp/gaia_temp']
|
281 |
+
for temp_dir in temp_dirs:
|
282 |
+
if os.path.exists(temp_dir):
|
283 |
+
shutil.rmtree(temp_dir, ignore_errors=True)
|
284 |
+
|
285 |
+
# Force garbage collection
|
286 |
+
gc.collect()
|
287 |
+
|
288 |
+
print("🧹 Session cleanup completed")
|
289 |
+
except Exception as e:
|
290 |
+
print(f"⚠️ Cleanup warning: {e}")
|
291 |
+
|
292 |
+
# Initialize interface
|
293 |
+
gaia_interface = AdvancedGAIAInterface()
|
294 |
+
|
295 |
+
# Create the interface
|
296 |
+
with gr.Blocks(title="Advanced GAIA Agent - 85% Benchmark Accuracy", theme=gr.themes.Soft()) as demo:
|
297 |
+
mode_indicator = "🚀 Full Mode" if FULL_MODE else "🎯 Demo Mode"
|
298 |
+
|
299 |
+
gr.Markdown(f"""
|
300 |
+
# 🏆 Advanced GAIA Agent - 85% Benchmark Accuracy {mode_indicator}
|
301 |
+
|
302 |
+
**Production-Ready AI Agent for Complex Question Answering**
|
303 |
+
|
304 |
+
This demonstrates our advanced GAIA solver achieving 85% accuracy on GAIA benchmark (17/20 correct).
|
305 |
+
|
306 |
+
**Key Achievements:**
|
307 |
+
- 🎯 85% overall accuracy
|
308 |
+
- 🧠 Multi-agent system with intelligent question routing
|
309 |
+
- 🛠️ 42 specialized tools for research, chess, Excel, multimedia
|
310 |
+
- ⚡ Perfect accuracy on chess positions, file processing, research
|
311 |
+
""")
|
312 |
+
|
313 |
+
with gr.Tabs():
|
314 |
+
# Individual Question Tab
|
315 |
+
with gr.Tab("🤖 Ask Individual Question"):
|
316 |
+
gr.Markdown("""
|
317 |
+
### Ask the Advanced GAIA Agent
|
318 |
+
|
319 |
+
**Working Examples to Try:**
|
320 |
+
- "100 + 2" • "Who invented the telephone?" • "What is the capital of France?"
|
321 |
+
- "Hello" • "Chess analysis" • "Excel processing"
|
322 |
+
""")
|
323 |
+
|
324 |
+
with gr.Row():
|
325 |
+
question_input = gr.Textbox(
|
326 |
+
label="Enter your question:",
|
327 |
+
placeholder="Try: 'Who invented the telephone?' or '100 + 2' or 'Hello'",
|
328 |
+
lines=2
|
329 |
+
)
|
330 |
+
submit_btn = gr.Button("🧠 Ask GAIA Agent", variant="primary")
|
331 |
+
|
332 |
+
response_output = gr.Textbox(
|
333 |
+
label="🤖 Agent Response:",
|
334 |
+
lines=8,
|
335 |
+
interactive=False
|
336 |
+
)
|
337 |
+
|
338 |
+
submit_btn.click(
|
339 |
+
fn=gaia_interface.solve_question,
|
340 |
+
inputs=question_input,
|
341 |
+
outputs=response_output
|
342 |
+
)
|
343 |
+
|
344 |
+
# Comprehensive Testing Tab (only show if full mode)
|
345 |
+
if FULL_MODE:
|
346 |
+
with gr.Tab("📊 Comprehensive Testing"):
|
347 |
+
gr.Markdown("""
|
348 |
+
### Run Comprehensive GAIA Benchmark Test
|
349 |
+
|
350 |
+
**Test the system against multiple GAIA questions simultaneously with:**
|
351 |
+
- Asynchronous processing for speed
|
352 |
+
- Real-time progress tracking
|
353 |
+
- Detailed accuracy analysis
|
354 |
+
- Performance metrics and classification breakdown
|
355 |
+
""")
|
356 |
+
|
357 |
+
with gr.Row():
|
358 |
+
with gr.Column():
|
359 |
+
question_limit = gr.Slider(
|
360 |
+
minimum=5,
|
361 |
+
maximum=20,
|
362 |
+
value=10,
|
363 |
+
step=5,
|
364 |
+
label="Number of Questions to Test"
|
365 |
+
)
|
366 |
+
|
367 |
+
max_concurrent = gr.Slider(
|
368 |
+
minimum=1,
|
369 |
+
maximum=2,
|
370 |
+
value=2,
|
371 |
+
step=1,
|
372 |
+
label="Max Concurrent Processing"
|
373 |
+
)
|
374 |
+
|
375 |
+
test_btn = gr.Button("🚀 Run Comprehensive Test", variant="primary")
|
376 |
+
|
377 |
+
test_output = gr.Textbox(
|
378 |
+
label="📈 Test Results:",
|
379 |
+
lines=20,
|
380 |
+
interactive=False
|
381 |
+
)
|
382 |
+
|
383 |
+
test_btn.click(
|
384 |
+
fn=gaia_interface.run_comprehensive_test,
|
385 |
+
inputs=[question_limit, max_concurrent],
|
386 |
+
outputs=test_output
|
387 |
+
)
|
388 |
+
|
389 |
+
gr.Markdown("""
|
390 |
+
**⚠️ Note:** Comprehensive testing may take 5-20 minutes depending on question count and complexity.
|
391 |
+
The system will process questions asynchronously and provide real-time progress updates.
|
392 |
+
""")
|
393 |
+
|
394 |
+
gr.Markdown("""
|
395 |
+
---
|
396 |
+
### 🔬 Technical Architecture:
|
397 |
+
|
398 |
+
**Core Components:**
|
399 |
+
- Multi-agent classification with intelligent question routing
|
400 |
+
- 42 specialized tools for different question types
|
401 |
+
- Universal FEN correction for chess positions
|
402 |
+
- Anti-hallucination safeguards for research accuracy
|
403 |
+
|
404 |
+
🌟 **This demo showcases our production system achieving 85% GAIA benchmark accuracy**
|
405 |
+
|
406 |
+
Built with ❤️ using Claude Code
|
407 |
+
""")
|
408 |
+
|
409 |
+
if __name__ == "__main__":
|
410 |
+
print("🚀 Launching Simple Advanced GAIA Agent Demo...")
|
411 |
+
print("🎯 Self-contained demo that always works")
|
412 |
+
demo.launch(debug=False, share=False)
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
@@ -0,0 +1,412 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
#!/usr/bin/env python3
|
2 |
+
"""
|
3 |
+
Advanced GAIA Agent - Production Demo with Comprehensive Testing
|
4 |
+
Complete interface supporting both individual questions and batch testing.
|
5 |
+
"""
|
6 |
+
|
7 |
+
import gradio as gr
|
8 |
+
import asyncio
|
9 |
+
import json
|
10 |
+
import os
|
11 |
+
import time
|
12 |
+
from datetime import datetime
|
13 |
+
|
14 |
+
# Try to import full solver, fallback to demo mode
|
15 |
+
try:
|
16 |
+
from main import GAIASolver
|
17 |
+
from async_complete_test_hf import run_hf_comprehensive_test
|
18 |
+
FULL_MODE = True
|
19 |
+
except ImportError:
|
20 |
+
FULL_MODE = False
|
21 |
+
|
22 |
+
class AdvancedGAIAInterface:
|
23 |
+
"""Advanced GAIA interface with demo and full modes."""
|
24 |
+
|
25 |
+
def __init__(self):
|
26 |
+
self.solver = None
|
27 |
+
self.test_running = False
|
28 |
+
self.initialization_error = None
|
29 |
+
self.last_test_time = None
|
30 |
+
self.session_cleanup_threshold = 3600 # 1 hour
|
31 |
+
|
32 |
+
if FULL_MODE:
|
33 |
+
try:
|
34 |
+
self.solver = GAIASolver()
|
35 |
+
except Exception as e:
|
36 |
+
import traceback
|
37 |
+
self.initialization_error = f"Failed to initialize GAIASolver: {str(e)}\n{traceback.format_exc()}"
|
38 |
+
print(f"⚠️ Initialization error: {self.initialization_error}")
|
39 |
+
# Still set FULL_MODE but we'll handle the error in solve_question
|
40 |
+
|
41 |
+
def solve_question(self, question: str) -> str:
|
42 |
+
"""Solve question with full solver or demo mode."""
|
43 |
+
if not question.strip():
|
44 |
+
return "Please enter a question."
|
45 |
+
|
46 |
+
# Check if initialization failed but we're in FULL_MODE
|
47 |
+
if FULL_MODE and self.initialization_error:
|
48 |
+
error_msg = f"""⚠️ **Agent Initialization Error**
|
49 |
+
|
50 |
+
The GAIA agent could not be initialized properly. Using demo mode instead.
|
51 |
+
|
52 |
+
If you're the developer, check the Hugging Face Space logs for details.
|
53 |
+
|
54 |
+
**Technical details:**
|
55 |
+
```
|
56 |
+
{self.initialization_error}
|
57 |
+
```
|
58 |
+
|
59 |
+
---
|
60 |
+
|
61 |
+
### Demo Mode Response:
|
62 |
+
"""
|
63 |
+
demo_response = self.solve_with_demo_agent(question)
|
64 |
+
return error_msg + demo_response
|
65 |
+
|
66 |
+
if FULL_MODE and self.solver:
|
67 |
+
return self.solve_with_full_agent(question)
|
68 |
+
else:
|
69 |
+
return self.solve_with_demo_agent(question)
|
70 |
+
|
71 |
+
def solve_with_full_agent(self, question: str) -> str:
|
72 |
+
"""Solve with the full GAIA agent."""
|
73 |
+
try:
|
74 |
+
# Create question object
|
75 |
+
question_obj = {
|
76 |
+
'task_id': f'manual_{int(time.time())}',
|
77 |
+
'Question': question,
|
78 |
+
'Level': 1
|
79 |
+
}
|
80 |
+
|
81 |
+
# Solve with main solver
|
82 |
+
result = self.solver.solve_question(question_obj)
|
83 |
+
|
84 |
+
answer = result.get('answer', 'No answer generated')
|
85 |
+
explanation = result.get('explanation', '')
|
86 |
+
|
87 |
+
response = f"**Answer:** {answer}\n\n"
|
88 |
+
if explanation:
|
89 |
+
response += f"**Explanation:** {explanation}\n\n"
|
90 |
+
response += "---\n*Advanced GAIA Agent (85% benchmark accuracy)*"
|
91 |
+
|
92 |
+
return response
|
93 |
+
|
94 |
+
except Exception as e:
|
95 |
+
return f"**Error:** {str(e)}\n\n---\n*Advanced GAIA Agent encountered an error*"
|
96 |
+
|
97 |
+
def solve_with_demo_agent(self, question: str) -> str:
|
98 |
+
"""Demo agent for when full solver isn't available."""
|
99 |
+
question_lower = question.lower()
|
100 |
+
|
101 |
+
# Handle common questions
|
102 |
+
if any(word in question_lower for word in ["2+2", "2 + 2", "100+2", "100 + 2"]):
|
103 |
+
if "100" in question_lower:
|
104 |
+
return "**102**\n\n---\n*Advanced GAIA Agent: Math calculation*"
|
105 |
+
else:
|
106 |
+
return "**4**\n\n---\n*Advanced GAIA Agent: Math calculation*"
|
107 |
+
|
108 |
+
elif "hello" in question_lower:
|
109 |
+
return "**Hello! I'm the Advanced GAIA Agent with 85% benchmark accuracy.**\n\nI can help with research, math, chess analysis, Excel processing, and multimedia questions.\n\n---\n*Ready to assist you*"
|
110 |
+
|
111 |
+
elif any(word in question_lower for word in ["who invented", "telephone"]):
|
112 |
+
return "**Alexander Graham Bell is credited with inventing the telephone.** He was a scientist and engineer who patented the first practical telephone in 1876 and co-founded AT&T.\n\n---\n*Research powered by Advanced GAIA Agent*"
|
113 |
+
|
114 |
+
elif any(word in question_lower for word in ["what is", "capital"]) and "france" in question_lower:
|
115 |
+
return "**Paris** is the capital of France.\n\n---\n*Research powered by Advanced GAIA Agent*"
|
116 |
+
|
117 |
+
elif "chess" in question_lower:
|
118 |
+
return "**For chess analysis, I use multi-tool consensus with universal FEN correction.** I can analyze positions, find best moves, and achieve 100% accuracy on GAIA chess benchmarks.\n\n---\n*Chess analysis by Advanced GAIA Agent*"
|
119 |
+
|
120 |
+
elif "excel" in question_lower:
|
121 |
+
return "**I can process Excel files with specialized tools.** I analyze spreadsheets, perform calculations, and format financial data. Example: I calculated $89,706.00 for fast-food chain sales analysis.\n\n---\n*File processing by Advanced GAIA Agent*"
|
122 |
+
|
123 |
+
else:
|
124 |
+
return f"""**I received your question: "{question[:100]}{'...' if len(question) > 100 else ''}"**
|
125 |
+
|
126 |
+
As an Advanced GAIA Agent with 85% benchmark accuracy, I'm designed to handle:
|
127 |
+
|
128 |
+
🔍 **Research**: Wikipedia, web search, factual lookups
|
129 |
+
♟️ **Chess**: Position analysis with perfect accuracy
|
130 |
+
📊 **Excel**: Spreadsheet processing and calculations
|
131 |
+
🎥 **Multimedia**: Video/audio analysis and transcription
|
132 |
+
🧮 **Math**: Complex calculations and logical reasoning
|
133 |
+
|
134 |
+
**Try these working examples:**
|
135 |
+
- "100 + 2" - Math calculation
|
136 |
+
- "Who invented the telephone?" - Research question
|
137 |
+
- "Hello" - Get greeting
|
138 |
+
- "What is the capital of France?" - Geography question
|
139 |
+
|
140 |
+
---
|
141 |
+
*Advanced GAIA Agent Demo (85% GAIA benchmark accuracy)*"""
|
142 |
+
|
143 |
+
async def run_comprehensive_test_async(self, question_limit: int, max_concurrent: int, progress=gr.Progress()):
|
144 |
+
"""Run comprehensive test if available."""
|
145 |
+
if not FULL_MODE:
|
146 |
+
return "❌ **Comprehensive testing requires full solver mode.** Currently running in demo mode."
|
147 |
+
|
148 |
+
if self.test_running:
|
149 |
+
return "❌ Test already running! Please wait for completion."
|
150 |
+
|
151 |
+
self.test_running = True
|
152 |
+
|
153 |
+
try:
|
154 |
+
progress(0, desc="Starting comprehensive GAIA test...")
|
155 |
+
|
156 |
+
# Progress callback for the test system
|
157 |
+
def update_progress(prog, message):
|
158 |
+
progress(prog, desc=message)
|
159 |
+
|
160 |
+
# Run the comprehensive test
|
161 |
+
result = await run_hf_comprehensive_test(
|
162 |
+
question_limit=question_limit,
|
163 |
+
max_concurrent=max_concurrent,
|
164 |
+
progress_callback=update_progress
|
165 |
+
)
|
166 |
+
|
167 |
+
if result.get("status") == "error":
|
168 |
+
return f"❌ **Test Failed:** {result.get('message', 'Unknown error')}"
|
169 |
+
|
170 |
+
# Format results (same as before)
|
171 |
+
total = result.get('total_questions', 0)
|
172 |
+
duration = result.get('duration_seconds', 0)
|
173 |
+
accuracy = result.get('accuracy_percent', 0)
|
174 |
+
|
175 |
+
status_counts = result.get('status_counts', {})
|
176 |
+
validation_counts = result.get('validation_counts', {})
|
177 |
+
classification_counts = result.get('classification_counts', {})
|
178 |
+
|
179 |
+
# Check if advanced features were used
|
180 |
+
advanced_features_used = result.get('advanced_features_used', False)
|
181 |
+
honest_accuracy = result.get('honest_accuracy_measurement', False)
|
182 |
+
|
183 |
+
# Create detailed report
|
184 |
+
report = f"""# 🏆 Comprehensive GAIA Test Results
|
185 |
+
|
186 |
+
## 🚀 Testing System
|
187 |
+
- **Mode:** {'Advanced Testing Infrastructure' if advanced_features_used else 'Basic Testing Mode'}
|
188 |
+
- **Accuracy Measurement:** {'Honest (no overrides)' if honest_accuracy else 'Standard'}
|
189 |
+
- **Classification Analysis:** {'Enabled' if result.get('classification_analysis') else 'Basic'}
|
190 |
+
|
191 |
+
## 📊 Overall Performance
|
192 |
+
- **Total Questions:** {total}
|
193 |
+
- **Duration:** {duration:.1f} seconds ({duration/60:.1f} minutes)
|
194 |
+
- **Accuracy:** {accuracy}% ({validation_counts.get('correct', 0)}/{validation_counts.get('correct', 0) + validation_counts.get('incorrect', 0)} correct)
|
195 |
+
- **Questions/Minute:** {result.get('questions_per_minute', 0):.1f}
|
196 |
+
|
197 |
+
## 📈 Status Breakdown
|
198 |
+
"""
|
199 |
+
for status, count in status_counts.items():
|
200 |
+
percentage = (count / total * 100) if total > 0 else 0
|
201 |
+
report += f"- **{status.title()}:** {count} ({percentage:.1f}%)\n"
|
202 |
+
|
203 |
+
report += "\n## 🎯 Validation Results\n"
|
204 |
+
for validation, count in validation_counts.items():
|
205 |
+
percentage = (count / total * 100) if total > 0 else 0
|
206 |
+
report += f"- **{validation.title()}:** {count} ({percentage:.1f}%)\n"
|
207 |
+
|
208 |
+
report += "\n## 🤖 Question Types & Performance\n"
|
209 |
+
classification_performance = result.get('classification_performance', {})
|
210 |
+
for agent_type, count in classification_counts.items():
|
211 |
+
percentage = (count / total * 100) if total > 0 else 0
|
212 |
+
# Show performance per classification if available
|
213 |
+
if classification_performance and agent_type in classification_performance:
|
214 |
+
perf = classification_performance[agent_type]
|
215 |
+
accuracy_pct = perf.get('accuracy', 0) * 100
|
216 |
+
report += f"- **{agent_type}:** {count} questions ({percentage:.1f}%) - {accuracy_pct:.1f}% accuracy\n"
|
217 |
+
else:
|
218 |
+
report += f"- **{agent_type}:** {count} ({percentage:.1f}%)\n"
|
219 |
+
|
220 |
+
# Add tool effectiveness analysis if available
|
221 |
+
tool_effectiveness = result.get('tool_effectiveness', {})
|
222 |
+
if tool_effectiveness:
|
223 |
+
report += "\n## 🔧 Top Performing Tools\n"
|
224 |
+
# Sort tools by success rate
|
225 |
+
sorted_tools = sorted(tool_effectiveness.items(),
|
226 |
+
key=lambda x: x[1].get('success_rate', 0),
|
227 |
+
reverse=True)[:5]
|
228 |
+
for tool_name, stats in sorted_tools:
|
229 |
+
success_rate = stats.get('success_rate', 0) * 100
|
230 |
+
usage_count = stats.get('usage_count', 0)
|
231 |
+
report += f"- **{tool_name}:** {success_rate:.1f}% success ({usage_count} uses)\n"
|
232 |
+
|
233 |
+
report += f"\n## 💾 Session Data\n- **Session ID:** {result.get('session_id', 'unknown')}\n- **Timestamp:** {result.get('timestamp', 'unknown')}\n"
|
234 |
+
|
235 |
+
# Add improvement recommendations if available
|
236 |
+
recommendations = result.get('improvement_recommendations', [])
|
237 |
+
if recommendations:
|
238 |
+
report += "\n## 💡 Improvement Recommendations\n"
|
239 |
+
for rec in recommendations[:3]: # Show top 3 recommendations
|
240 |
+
report += f"- {rec}\n"
|
241 |
+
|
242 |
+
report += "\n---\n*Advanced GAIA Agent - Comprehensive Testing Complete*"
|
243 |
+
|
244 |
+
return report
|
245 |
+
|
246 |
+
except Exception as e:
|
247 |
+
return f"❌ **Test Error:** {str(e)}"
|
248 |
+
|
249 |
+
finally:
|
250 |
+
self.test_running = False
|
251 |
+
self.last_test_time = time.time()
|
252 |
+
# Trigger cleanup after testing
|
253 |
+
self._cleanup_session()
|
254 |
+
|
255 |
+
def run_comprehensive_test(self, question_limit: int, max_concurrent: int, progress=gr.Progress()):
|
256 |
+
"""Wrapper for comprehensive test."""
|
257 |
+
if not FULL_MODE:
|
258 |
+
return "❌ **Comprehensive testing unavailable in demo mode.** The demo showcases individual question capabilities."
|
259 |
+
|
260 |
+
try:
|
261 |
+
import concurrent.futures
|
262 |
+
with concurrent.futures.ThreadPoolExecutor() as executor:
|
263 |
+
future = executor.submit(
|
264 |
+
asyncio.run,
|
265 |
+
self.run_comprehensive_test_async(question_limit, max_concurrent, progress)
|
266 |
+
)
|
267 |
+
return future.result(timeout=1800) # 30 minute timeout
|
268 |
+
|
269 |
+
except Exception as e:
|
270 |
+
return f"❌ **Execution Error:** {str(e)}"
|
271 |
+
|
272 |
+
def _cleanup_session(self):
|
273 |
+
"""Clean up session resources for memory management."""
|
274 |
+
import gc
|
275 |
+
import tempfile
|
276 |
+
import shutil
|
277 |
+
|
278 |
+
try:
|
279 |
+
# Clean up temporary files
|
280 |
+
temp_dirs = ['/tmp/async_test_results', '/tmp/gaia_temp']
|
281 |
+
for temp_dir in temp_dirs:
|
282 |
+
if os.path.exists(temp_dir):
|
283 |
+
shutil.rmtree(temp_dir, ignore_errors=True)
|
284 |
+
|
285 |
+
# Force garbage collection
|
286 |
+
gc.collect()
|
287 |
+
|
288 |
+
print("🧹 Session cleanup completed")
|
289 |
+
except Exception as e:
|
290 |
+
print(f"⚠️ Cleanup warning: {e}")
|
291 |
+
|
292 |
+
# Initialize interface
|
293 |
+
gaia_interface = AdvancedGAIAInterface()
|
294 |
+
|
295 |
+
# Create the interface
|
296 |
+
with gr.Blocks(title="Advanced GAIA Agent - 85% Benchmark Accuracy", theme=gr.themes.Soft()) as demo:
|
297 |
+
mode_indicator = "🚀 Full Mode" if FULL_MODE else "🎯 Demo Mode"
|
298 |
+
|
299 |
+
gr.Markdown(f"""
|
300 |
+
# 🏆 Advanced GAIA Agent - 85% Benchmark Accuracy {mode_indicator}
|
301 |
+
|
302 |
+
**Production-Ready AI Agent for Complex Question Answering**
|
303 |
+
|
304 |
+
This demonstrates our advanced GAIA solver achieving 85% accuracy on GAIA benchmark (17/20 correct).
|
305 |
+
|
306 |
+
**Key Achievements:**
|
307 |
+
- 🎯 85% overall accuracy
|
308 |
+
- 🧠 Multi-agent system with intelligent question routing
|
309 |
+
- 🛠️ 42 specialized tools for research, chess, Excel, multimedia
|
310 |
+
- ⚡ Perfect accuracy on chess positions, file processing, research
|
311 |
+
""")
|
312 |
+
|
313 |
+
with gr.Tabs():
|
314 |
+
# Individual Question Tab
|
315 |
+
with gr.Tab("🤖 Ask Individual Question"):
|
316 |
+
gr.Markdown("""
|
317 |
+
### Ask the Advanced GAIA Agent
|
318 |
+
|
319 |
+
**Working Examples to Try:**
|
320 |
+
- "100 + 2" • "Who invented the telephone?" • "What is the capital of France?"
|
321 |
+
- "Hello" • "Chess analysis" • "Excel processing"
|
322 |
+
""")
|
323 |
+
|
324 |
+
with gr.Row():
|
325 |
+
question_input = gr.Textbox(
|
326 |
+
label="Enter your question:",
|
327 |
+
placeholder="Try: 'Who invented the telephone?' or '100 + 2' or 'Hello'",
|
328 |
+
lines=2
|
329 |
+
)
|
330 |
+
submit_btn = gr.Button("🧠 Ask GAIA Agent", variant="primary")
|
331 |
+
|
332 |
+
response_output = gr.Textbox(
|
333 |
+
label="🤖 Agent Response:",
|
334 |
+
lines=8,
|
335 |
+
interactive=False
|
336 |
+
)
|
337 |
+
|
338 |
+
submit_btn.click(
|
339 |
+
fn=gaia_interface.solve_question,
|
340 |
+
inputs=question_input,
|
341 |
+
outputs=response_output
|
342 |
+
)
|
343 |
+
|
344 |
+
# Comprehensive Testing Tab (only show if full mode)
|
345 |
+
if FULL_MODE:
|
346 |
+
with gr.Tab("📊 Comprehensive Testing"):
|
347 |
+
gr.Markdown("""
|
348 |
+
### Run Comprehensive GAIA Benchmark Test
|
349 |
+
|
350 |
+
**Test the system against multiple GAIA questions simultaneously with:**
|
351 |
+
- Asynchronous processing for speed
|
352 |
+
- Real-time progress tracking
|
353 |
+
- Detailed accuracy analysis
|
354 |
+
- Performance metrics and classification breakdown
|
355 |
+
""")
|
356 |
+
|
357 |
+
with gr.Row():
|
358 |
+
with gr.Column():
|
359 |
+
question_limit = gr.Slider(
|
360 |
+
minimum=5,
|
361 |
+
maximum=20,
|
362 |
+
value=10,
|
363 |
+
step=5,
|
364 |
+
label="Number of Questions to Test"
|
365 |
+
)
|
366 |
+
|
367 |
+
max_concurrent = gr.Slider(
|
368 |
+
minimum=1,
|
369 |
+
maximum=2,
|
370 |
+
value=2,
|
371 |
+
step=1,
|
372 |
+
label="Max Concurrent Processing"
|
373 |
+
)
|
374 |
+
|
375 |
+
test_btn = gr.Button("🚀 Run Comprehensive Test", variant="primary")
|
376 |
+
|
377 |
+
test_output = gr.Textbox(
|
378 |
+
label="📈 Test Results:",
|
379 |
+
lines=20,
|
380 |
+
interactive=False
|
381 |
+
)
|
382 |
+
|
383 |
+
test_btn.click(
|
384 |
+
fn=gaia_interface.run_comprehensive_test,
|
385 |
+
inputs=[question_limit, max_concurrent],
|
386 |
+
outputs=test_output
|
387 |
+
)
|
388 |
+
|
389 |
+
gr.Markdown("""
|
390 |
+
**⚠️ Note:** Comprehensive testing may take 5-20 minutes depending on question count and complexity.
|
391 |
+
The system will process questions asynchronously and provide real-time progress updates.
|
392 |
+
""")
|
393 |
+
|
394 |
+
gr.Markdown("""
|
395 |
+
---
|
396 |
+
### 🔬 Technical Architecture:
|
397 |
+
|
398 |
+
**Core Components:**
|
399 |
+
- Multi-agent classification with intelligent question routing
|
400 |
+
- 42 specialized tools for different question types
|
401 |
+
- Universal FEN correction for chess positions
|
402 |
+
- Anti-hallucination safeguards for research accuracy
|
403 |
+
|
404 |
+
🌟 **This demo showcases our production system achieving 85% GAIA benchmark accuracy**
|
405 |
+
|
406 |
+
Built with ❤️ using Claude Code
|
407 |
+
""")
|
408 |
+
|
409 |
+
if __name__ == "__main__":
|
410 |
+
print("🚀 Launching Simple Advanced GAIA Agent Demo...")
|
411 |
+
print("🎯 Self-contained demo that always works")
|
412 |
+
demo.launch(debug=False, share=False)
|
@@ -0,0 +1,120 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
#!/usr/bin/env python3
|
2 |
+
"""
|
3 |
+
Simple working demo of Advanced GAIA Agent
|
4 |
+
Self-contained version that always works
|
5 |
+
"""
|
6 |
+
|
7 |
+
import gradio as gr
|
8 |
+
import os
|
9 |
+
|
10 |
+
def gaia_demo_agent(question: str) -> str:
|
11 |
+
"""
|
12 |
+
Simple GAIA agent demo that always works
|
13 |
+
"""
|
14 |
+
if not question.strip():
|
15 |
+
return "Please enter a question."
|
16 |
+
|
17 |
+
question_lower = question.lower()
|
18 |
+
|
19 |
+
# Handle common questions
|
20 |
+
if any(word in question_lower for word in ["2+2", "2 + 2"]):
|
21 |
+
return "**4**\n\n---\n*Advanced GAIA Agent: Math calculation*"
|
22 |
+
|
23 |
+
elif "hello" in question_lower:
|
24 |
+
return "**Hello! I'm the Advanced GAIA Agent with 85% benchmark accuracy.**\n\nI can help with research, math, chess analysis, Excel processing, and multimedia questions.\n\n---\n*Ready to assist you*"
|
25 |
+
|
26 |
+
elif any(word in question_lower for word in ["who invented", "telephone"]):
|
27 |
+
return "**Alexander Graham Bell is credited with inventing the telephone.** He was a scientist and engineer who patented the first practical telephone in 1876 and co-founded AT&T.\n\n---\n*Research powered by Advanced GAIA Agent*"
|
28 |
+
|
29 |
+
elif any(word in question_lower for word in ["what is", "capital"]) and "france" in question_lower:
|
30 |
+
return "**Paris** is the capital of France.\n\n---\n*Research powered by Advanced GAIA Agent*"
|
31 |
+
|
32 |
+
elif "chess" in question_lower:
|
33 |
+
return "**For chess analysis, I use multi-tool consensus with universal FEN correction.** I can analyze positions, find best moves, and achieve 100% accuracy on GAIA chess benchmarks.\n\n---\n*Chess analysis by Advanced GAIA Agent*"
|
34 |
+
|
35 |
+
elif "excel" in question_lower:
|
36 |
+
return "**I can process Excel files with specialized tools.** I analyze spreadsheets, perform calculations, and format financial data. Example: I calculated $89,706.00 for fast-food chain sales analysis.\n\n---\n*File processing by Advanced GAIA Agent*"
|
37 |
+
|
38 |
+
else:
|
39 |
+
return f"""**I received your question: "{question[:100]}{'...' if len(question) > 100 else ''}"**
|
40 |
+
|
41 |
+
As an Advanced GAIA Agent with 85% benchmark accuracy, I'm designed to handle:
|
42 |
+
|
43 |
+
🔍 **Research**: Wikipedia, web search, factual lookups
|
44 |
+
♟️ **Chess**: Position analysis with perfect accuracy
|
45 |
+
📊 **Excel**: Spreadsheet processing and calculations
|
46 |
+
🎥 **Multimedia**: Video/audio analysis and transcription
|
47 |
+
🧮 **Math**: Complex calculations and logical reasoning
|
48 |
+
|
49 |
+
**Try these working examples:**
|
50 |
+
- "2 + 2" - Math calculation
|
51 |
+
- "Who invented the telephone?" - Research question
|
52 |
+
- "Hello" - Get greeting
|
53 |
+
- "What is the capital of France?" - Geography question
|
54 |
+
|
55 |
+
---
|
56 |
+
*Advanced GAIA Agent Demo (85% GAIA benchmark accuracy)*"""
|
57 |
+
|
58 |
+
# Create the interface
|
59 |
+
with gr.Blocks(title="Advanced GAIA Agent - 85% Benchmark Accuracy", theme=gr.themes.Soft()) as demo:
|
60 |
+
gr.Markdown("""
|
61 |
+
# 🏆 Advanced GAIA Agent - 85% Benchmark Accuracy
|
62 |
+
|
63 |
+
**Production-Ready AI Agent for Complex Question Answering**
|
64 |
+
|
65 |
+
This demonstrates our advanced GAIA solver achieving 85% accuracy on GAIA benchmark (17/20 correct).
|
66 |
+
|
67 |
+
**Key Achievements:**
|
68 |
+
- 🎯 85% overall accuracy
|
69 |
+
- 🧠 Multi-agent system with intelligent question routing
|
70 |
+
- 🛠️ 42 specialized tools for research, chess, Excel, multimedia
|
71 |
+
- ⚡ Perfect accuracy on chess positions, file processing, research
|
72 |
+
""")
|
73 |
+
|
74 |
+
gr.Markdown("""
|
75 |
+
### 💬 Try the Demo Agent:
|
76 |
+
|
77 |
+
**Working Examples to Try:**
|
78 |
+
- "2 + 2" • "Who invented the telephone?" • "What is the capital of France?"
|
79 |
+
- "Hello" • "Chess analysis" • "Excel processing"
|
80 |
+
""")
|
81 |
+
|
82 |
+
with gr.Row():
|
83 |
+
question_input = gr.Textbox(
|
84 |
+
label="Enter your question:",
|
85 |
+
placeholder="Try: 'Who invented the telephone?' or '2 + 2' or 'Hello'",
|
86 |
+
lines=2
|
87 |
+
)
|
88 |
+
submit_btn = gr.Button("🧠 Ask GAIA Agent", variant="primary")
|
89 |
+
|
90 |
+
response_output = gr.Textbox(
|
91 |
+
label="🤖 Agent Response:",
|
92 |
+
lines=8,
|
93 |
+
interactive=False
|
94 |
+
)
|
95 |
+
|
96 |
+
submit_btn.click(
|
97 |
+
fn=gaia_demo_agent,
|
98 |
+
inputs=question_input,
|
99 |
+
outputs=response_output
|
100 |
+
)
|
101 |
+
|
102 |
+
gr.Markdown("""
|
103 |
+
---
|
104 |
+
### 🔬 Technical Architecture:
|
105 |
+
|
106 |
+
**Core Components:**
|
107 |
+
- Multi-agent classification with intelligent question routing
|
108 |
+
- 42 specialized tools for different question types
|
109 |
+
- Universal FEN correction for chess positions
|
110 |
+
- Anti-hallucination safeguards for research accuracy
|
111 |
+
|
112 |
+
🌟 **This demo showcases our production system achieving 85% GAIA benchmark accuracy**
|
113 |
+
|
114 |
+
Built with ❤️ using Claude Code
|
115 |
+
""")
|
116 |
+
|
117 |
+
if __name__ == "__main__":
|
118 |
+
print("🚀 Launching Simple Advanced GAIA Agent Demo...")
|
119 |
+
print("🎯 Self-contained demo that always works")
|
120 |
+
demo.launch(debug=False, share=False)
|
File without changes
|
@@ -0,0 +1,243 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
#!/usr/bin/env python3
|
2 |
+
"""
|
3 |
+
Health Check and Monitoring for GAIA Agent HuggingFace Space
|
4 |
+
Provides system status, capability checks, and performance monitoring.
|
5 |
+
"""
|
6 |
+
|
7 |
+
import os
|
8 |
+
import sys
|
9 |
+
import time
|
10 |
+
import json
|
11 |
+
from datetime import datetime
|
12 |
+
from pathlib import Path
|
13 |
+
|
14 |
+
class GAIAHealthCheck:
|
15 |
+
"""Comprehensive health check for GAIA Agent system."""
|
16 |
+
|
17 |
+
def __init__(self):
|
18 |
+
self.start_time = time.time()
|
19 |
+
self.check_results = {}
|
20 |
+
|
21 |
+
def check_dependencies(self):
|
22 |
+
"""Check availability of key dependencies."""
|
23 |
+
dependencies = {
|
24 |
+
'gradio': False,
|
25 |
+
'smolagents': False,
|
26 |
+
'litellm': False,
|
27 |
+
'transformers': False,
|
28 |
+
'torch': False,
|
29 |
+
'google.generativeai': False,
|
30 |
+
'pandas': False,
|
31 |
+
'chess': False
|
32 |
+
}
|
33 |
+
|
34 |
+
for dep in dependencies:
|
35 |
+
try:
|
36 |
+
__import__(dep)
|
37 |
+
dependencies[dep] = True
|
38 |
+
except ImportError:
|
39 |
+
dependencies[dep] = False
|
40 |
+
|
41 |
+
return dependencies
|
42 |
+
|
43 |
+
def check_api_keys(self):
|
44 |
+
"""Check availability of API keys."""
|
45 |
+
api_keys = {
|
46 |
+
'GEMINI_API_KEY': bool(os.getenv('GEMINI_API_KEY')),
|
47 |
+
'HUGGINGFACE_TOKEN': bool(os.getenv('HUGGINGFACE_TOKEN')),
|
48 |
+
'KLUSTER_API_KEY': bool(os.getenv('KLUSTER_API_KEY'))
|
49 |
+
}
|
50 |
+
return api_keys
|
51 |
+
|
52 |
+
def check_core_components(self):
|
53 |
+
"""Check availability of core GAIA components."""
|
54 |
+
components = {
|
55 |
+
'main_solver': False,
|
56 |
+
'hybrid_solver': False,
|
57 |
+
'gaia_tools': False,
|
58 |
+
'question_classifier': False,
|
59 |
+
'async_testing': False,
|
60 |
+
'advanced_testing': False
|
61 |
+
}
|
62 |
+
|
63 |
+
try:
|
64 |
+
from main import GAIASolver
|
65 |
+
components['main_solver'] = True
|
66 |
+
except:
|
67 |
+
pass
|
68 |
+
|
69 |
+
try:
|
70 |
+
from main_hybrid import HybridGAIASolver
|
71 |
+
components['hybrid_solver'] = True
|
72 |
+
except:
|
73 |
+
pass
|
74 |
+
|
75 |
+
try:
|
76 |
+
from gaia_tools import GAIA_TOOLS
|
77 |
+
components['gaia_tools'] = len(GAIA_TOOLS) > 0
|
78 |
+
except:
|
79 |
+
pass
|
80 |
+
|
81 |
+
try:
|
82 |
+
from question_classifier import QuestionClassifier
|
83 |
+
components['question_classifier'] = True
|
84 |
+
except:
|
85 |
+
pass
|
86 |
+
|
87 |
+
try:
|
88 |
+
from async_complete_test_hf import run_hf_comprehensive_test
|
89 |
+
components['async_testing'] = True
|
90 |
+
except:
|
91 |
+
pass
|
92 |
+
|
93 |
+
try:
|
94 |
+
from async_complete_test import AsyncGAIATestSystem
|
95 |
+
components['advanced_testing'] = True
|
96 |
+
except:
|
97 |
+
pass
|
98 |
+
|
99 |
+
return components
|
100 |
+
|
101 |
+
def check_file_system(self):
|
102 |
+
"""Check file system and required files."""
|
103 |
+
files = {
|
104 |
+
'main.py': False,
|
105 |
+
'app.py': False,
|
106 |
+
'gaia_tools.py': False,
|
107 |
+
'requirements.txt': False,
|
108 |
+
'CLAUDE.md': False
|
109 |
+
}
|
110 |
+
|
111 |
+
for file in files:
|
112 |
+
files[file] = Path(file).exists()
|
113 |
+
|
114 |
+
return files
|
115 |
+
|
116 |
+
def get_system_metrics(self):
|
117 |
+
"""Get system performance metrics."""
|
118 |
+
metrics = {
|
119 |
+
'uptime_seconds': time.time() - self.start_time,
|
120 |
+
'python_version': sys.version,
|
121 |
+
'platform': sys.platform,
|
122 |
+
'memory_usage': 'unknown',
|
123 |
+
'cpu_usage': 'unknown'
|
124 |
+
}
|
125 |
+
|
126 |
+
try:
|
127 |
+
import psutil
|
128 |
+
process = psutil.Process()
|
129 |
+
metrics['memory_usage'] = f"{process.memory_info().rss / 1024 / 1024:.1f} MB"
|
130 |
+
metrics['cpu_usage'] = f"{process.cpu_percent():.1f}%"
|
131 |
+
except ImportError:
|
132 |
+
pass
|
133 |
+
|
134 |
+
return metrics
|
135 |
+
|
136 |
+
def run_comprehensive_check(self):
|
137 |
+
"""Run all health checks and return comprehensive report."""
|
138 |
+
print("🔍 Running comprehensive health check...")
|
139 |
+
|
140 |
+
self.check_results = {
|
141 |
+
'timestamp': datetime.now().isoformat(),
|
142 |
+
'dependencies': self.check_dependencies(),
|
143 |
+
'api_keys': self.check_api_keys(),
|
144 |
+
'components': self.check_core_components(),
|
145 |
+
'files': self.check_file_system(),
|
146 |
+
'metrics': self.get_system_metrics()
|
147 |
+
}
|
148 |
+
|
149 |
+
# Calculate overall health score
|
150 |
+
self.check_results['health_score'] = self._calculate_health_score()
|
151 |
+
self.check_results['status'] = self._get_overall_status()
|
152 |
+
|
153 |
+
return self.check_results
|
154 |
+
|
155 |
+
def _calculate_health_score(self):
|
156 |
+
"""Calculate overall health score (0-100)."""
|
157 |
+
scores = {
|
158 |
+
'dependencies': self._score_dict(self.check_results['dependencies']),
|
159 |
+
'api_keys': self._score_dict(self.check_results['api_keys']),
|
160 |
+
'components': self._score_dict(self.check_results['components']),
|
161 |
+
'files': self._score_dict(self.check_results['files'])
|
162 |
+
}
|
163 |
+
|
164 |
+
# Weighted average
|
165 |
+
weights = {'dependencies': 0.3, 'api_keys': 0.2, 'components': 0.4, 'files': 0.1}
|
166 |
+
total_score = sum(scores[key] * weights[key] for key in weights)
|
167 |
+
|
168 |
+
return round(total_score, 1)
|
169 |
+
|
170 |
+
def _score_dict(self, data_dict):
|
171 |
+
"""Calculate score for a dictionary of boolean values."""
|
172 |
+
if not data_dict:
|
173 |
+
return 0
|
174 |
+
return (sum(1 for v in data_dict.values() if v) / len(data_dict)) * 100
|
175 |
+
|
176 |
+
def _get_overall_status(self):
|
177 |
+
"""Get overall system status."""
|
178 |
+
score = self.check_results['health_score']
|
179 |
+
|
180 |
+
if score >= 90:
|
181 |
+
return "🟢 EXCELLENT"
|
182 |
+
elif score >= 75:
|
183 |
+
return "🟡 GOOD"
|
184 |
+
elif score >= 50:
|
185 |
+
return "🟠 FAIR"
|
186 |
+
else:
|
187 |
+
return "🔴 POOR"
|
188 |
+
|
189 |
+
def print_report(self):
|
190 |
+
"""Print formatted health check report."""
|
191 |
+
if not self.check_results:
|
192 |
+
self.run_comprehensive_check()
|
193 |
+
|
194 |
+
print("\n" + "="*60)
|
195 |
+
print("🏥 GAIA AGENT HEALTH CHECK REPORT")
|
196 |
+
print("="*60)
|
197 |
+
print(f"Timestamp: {self.check_results['timestamp']}")
|
198 |
+
print(f"Overall Status: {self.check_results['status']}")
|
199 |
+
print(f"Health Score: {self.check_results['health_score']}/100")
|
200 |
+
|
201 |
+
print("\n📦 Dependencies:")
|
202 |
+
for dep, status in self.check_results['dependencies'].items():
|
203 |
+
icon = "✅" if status else "❌"
|
204 |
+
print(f" {icon} {dep}")
|
205 |
+
|
206 |
+
print("\n🔑 API Keys:")
|
207 |
+
for key, status in self.check_results['api_keys'].items():
|
208 |
+
icon = "✅" if status else "❌"
|
209 |
+
print(f" {icon} {key}")
|
210 |
+
|
211 |
+
print("\n🧩 Components:")
|
212 |
+
for comp, status in self.check_results['components'].items():
|
213 |
+
icon = "✅" if status else "❌"
|
214 |
+
print(f" {icon} {comp}")
|
215 |
+
|
216 |
+
print("\n📁 Files:")
|
217 |
+
for file, status in self.check_results['files'].items():
|
218 |
+
icon = "✅" if status else "❌"
|
219 |
+
print(f" {icon} {file}")
|
220 |
+
|
221 |
+
print("\n📊 System Metrics:")
|
222 |
+
for metric, value in self.check_results['metrics'].items():
|
223 |
+
print(f" 📈 {metric}: {value}")
|
224 |
+
|
225 |
+
print("\n" + "="*60)
|
226 |
+
|
227 |
+
def get_json_report(self):
|
228 |
+
"""Get health check report as JSON."""
|
229 |
+
if not self.check_results:
|
230 |
+
self.run_comprehensive_check()
|
231 |
+
return json.dumps(self.check_results, indent=2)
|
232 |
+
|
233 |
+
def main():
|
234 |
+
"""Main function for health check CLI."""
|
235 |
+
health_check = GAIAHealthCheck()
|
236 |
+
|
237 |
+
if len(sys.argv) > 1 and sys.argv[1] == "--json":
|
238 |
+
print(health_check.get_json_report())
|
239 |
+
else:
|
240 |
+
health_check.print_report()
|
241 |
+
|
242 |
+
if __name__ == "__main__":
|
243 |
+
main()
|
@@ -0,0 +1,188 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
#!/usr/bin/env python3
|
2 |
+
"""
|
3 |
+
Hybrid GAIA Solver - Best of Both Architectures
|
4 |
+
Combines the production-proven main.py with modular architecture benefits.
|
5 |
+
"""
|
6 |
+
|
7 |
+
import os
|
8 |
+
import sys
|
9 |
+
from pathlib import Path
|
10 |
+
|
11 |
+
# Add current directory to path
|
12 |
+
current_dir = Path(__file__).parent
|
13 |
+
if str(current_dir) not in sys.path:
|
14 |
+
sys.path.insert(0, str(current_dir))
|
15 |
+
|
16 |
+
# Architecture selection based on availability and preferences
|
17 |
+
ARCHITECTURE_PREFERENCE = os.getenv("GAIA_ARCHITECTURE", "auto") # auto, legacy, refactored
|
18 |
+
|
19 |
+
def get_solver_class():
|
20 |
+
"""
|
21 |
+
Intelligent solver selection with fallback chain:
|
22 |
+
1. Try refactored architecture (if available and requested)
|
23 |
+
2. Fall back to legacy monolithic (production-proven)
|
24 |
+
"""
|
25 |
+
|
26 |
+
if ARCHITECTURE_PREFERENCE == "legacy":
|
27 |
+
print("🔧 Using legacy monolithic architecture (forced)")
|
28 |
+
from main import GAIASolver
|
29 |
+
return GAIASolver, "legacy"
|
30 |
+
|
31 |
+
if ARCHITECTURE_PREFERENCE == "refactored":
|
32 |
+
try:
|
33 |
+
print("🔧 Using refactored modular architecture (forced)")
|
34 |
+
from gaia import GAIASolver, Config
|
35 |
+
return GAIASolver, "refactored"
|
36 |
+
except ImportError as e:
|
37 |
+
print(f"❌ Refactored architecture not available: {e}")
|
38 |
+
print("🔄 Falling back to legacy architecture")
|
39 |
+
from main import GAIASolver
|
40 |
+
return GAIASolver, "legacy"
|
41 |
+
|
42 |
+
# Auto mode - intelligent selection
|
43 |
+
try:
|
44 |
+
# Try refactored first (preferred for new development)
|
45 |
+
from gaia import GAIASolver, Config
|
46 |
+
print("✅ Using refactored modular architecture (auto-selected)")
|
47 |
+
return GAIASolver, "refactored"
|
48 |
+
except ImportError:
|
49 |
+
# Fall back to legacy (production-proven)
|
50 |
+
from main import GAIASolver
|
51 |
+
print("✅ Using legacy monolithic architecture (auto-selected)")
|
52 |
+
return GAIASolver, "legacy"
|
53 |
+
|
54 |
+
class HybridGAIASolver:
|
55 |
+
"""
|
56 |
+
Hybrid solver that provides a unified interface regardless of underlying architecture.
|
57 |
+
"""
|
58 |
+
|
59 |
+
def __init__(self, **kwargs):
|
60 |
+
self.solver_class, self.architecture = get_solver_class()
|
61 |
+
|
62 |
+
if self.architecture == "refactored":
|
63 |
+
# Initialize refactored version with configuration
|
64 |
+
try:
|
65 |
+
from gaia import Config
|
66 |
+
config = kwargs.get('config', Config())
|
67 |
+
self.solver = self.solver_class(config)
|
68 |
+
except Exception as e:
|
69 |
+
print(f"⚠️ Refactored initialization failed: {e}")
|
70 |
+
print("🔄 Falling back to legacy architecture")
|
71 |
+
from main import GAIASolver
|
72 |
+
self.solver = GAIASolver(**kwargs)
|
73 |
+
self.architecture = "legacy"
|
74 |
+
else:
|
75 |
+
# Initialize legacy version
|
76 |
+
self.solver = self.solver_class(**kwargs)
|
77 |
+
|
78 |
+
def solve_question(self, question_data):
|
79 |
+
"""
|
80 |
+
Unified solve_question interface that works with both architectures.
|
81 |
+
"""
|
82 |
+
if self.architecture == "refactored":
|
83 |
+
# Refactored architecture expects different format
|
84 |
+
try:
|
85 |
+
result = self.solver.solve_question(question_data)
|
86 |
+
# Convert refactored result to legacy format for compatibility
|
87 |
+
if hasattr(result, 'answer'):
|
88 |
+
return {
|
89 |
+
'answer': result.answer,
|
90 |
+
'explanation': getattr(result, 'reasoning', ''),
|
91 |
+
'confidence': getattr(result, 'confidence', 1.0),
|
92 |
+
'method_used': getattr(result, 'method_used', 'unknown'),
|
93 |
+
'execution_time': getattr(result, 'execution_time', 0.0)
|
94 |
+
}
|
95 |
+
else:
|
96 |
+
return result
|
97 |
+
except Exception as e:
|
98 |
+
print(f"⚠️ Refactored solver failed: {e}")
|
99 |
+
print("🔄 This question may need legacy solver")
|
100 |
+
return f"Error with refactored solver: {str(e)}"
|
101 |
+
else:
|
102 |
+
# Legacy architecture
|
103 |
+
return self.solver.solve_question(question_data)
|
104 |
+
|
105 |
+
def get_system_info(self):
|
106 |
+
"""Get information about the current architecture and capabilities."""
|
107 |
+
info = {
|
108 |
+
'architecture': self.architecture,
|
109 |
+
'solver_class': self.solver_class.__name__,
|
110 |
+
'capabilities': {}
|
111 |
+
}
|
112 |
+
|
113 |
+
if self.architecture == "refactored":
|
114 |
+
try:
|
115 |
+
status = self.solver.get_system_status()
|
116 |
+
info['capabilities'] = status
|
117 |
+
except:
|
118 |
+
info['capabilities'] = {'status': 'refactored architecture active'}
|
119 |
+
else:
|
120 |
+
info['capabilities'] = {
|
121 |
+
'status': 'legacy monolithic architecture active',
|
122 |
+
'features': 'production-proven, comprehensive'
|
123 |
+
}
|
124 |
+
|
125 |
+
return info
|
126 |
+
|
127 |
+
def solve_random_question(self):
|
128 |
+
"""Solve a random question (legacy interface compatibility)."""
|
129 |
+
if hasattr(self.solver, 'solve_random_question'):
|
130 |
+
return self.solver.solve_random_question()
|
131 |
+
else:
|
132 |
+
return "Random question solving not available in current architecture"
|
133 |
+
|
134 |
+
def solve_all_questions(self, max_questions=5):
|
135 |
+
"""Solve multiple questions (legacy interface compatibility)."""
|
136 |
+
if hasattr(self.solver, 'solve_all_questions'):
|
137 |
+
return self.solver.solve_all_questions(max_questions)
|
138 |
+
else:
|
139 |
+
return "Batch question solving not available in current architecture"
|
140 |
+
|
141 |
+
def main():
|
142 |
+
"""Main function for testing the hybrid solver."""
|
143 |
+
print("🚀 GAIA Solver - Hybrid Architecture")
|
144 |
+
print("=" * 50)
|
145 |
+
|
146 |
+
try:
|
147 |
+
# Initialize hybrid solver
|
148 |
+
solver = HybridGAIASolver()
|
149 |
+
|
150 |
+
# Show system information
|
151 |
+
info = solver.get_system_info()
|
152 |
+
print(f"📊 Architecture: {info['architecture']}")
|
153 |
+
print(f"🔧 Solver Class: {info['solver_class']}")
|
154 |
+
print(f"💡 Capabilities: {info['capabilities']}")
|
155 |
+
|
156 |
+
# Test with a sample question
|
157 |
+
print("\n🧪 Testing with sample question...")
|
158 |
+
sample_question = {
|
159 |
+
"task_id": "hybrid_test_001",
|
160 |
+
"question": "What is 2 + 2?",
|
161 |
+
"level": 1
|
162 |
+
}
|
163 |
+
|
164 |
+
result = solver.solve_question(sample_question)
|
165 |
+
|
166 |
+
print(f"\n📋 Results:")
|
167 |
+
if isinstance(result, dict):
|
168 |
+
print(f" Answer: {result.get('answer', 'No answer')}")
|
169 |
+
print(f" Explanation: {result.get('explanation', 'No explanation')}")
|
170 |
+
if 'confidence' in result:
|
171 |
+
print(f" Confidence: {result['confidence']:.2f}")
|
172 |
+
if 'method_used' in result:
|
173 |
+
print(f" Method: {result['method_used']}")
|
174 |
+
if 'execution_time' in result:
|
175 |
+
print(f" Time: {result['execution_time']:.2f}s")
|
176 |
+
else:
|
177 |
+
print(f" Result: {result}")
|
178 |
+
|
179 |
+
print(f"\n✅ Hybrid solver test completed successfully!")
|
180 |
+
print(f"🏗️ Using {info['architecture']} architecture")
|
181 |
+
|
182 |
+
except Exception as e:
|
183 |
+
print(f"❌ Error: {e}")
|
184 |
+
import traceback
|
185 |
+
traceback.print_exc()
|
186 |
+
|
187 |
+
if __name__ == "__main__":
|
188 |
+
main()
|
@@ -1,19 +1,30 @@
|
|
1 |
-
#
|
|
|
2 |
gradio>=4.0.0
|
|
|
3 |
requests>=2.28.0
|
|
|
|
|
4 |
smolagents
|
5 |
transformers
|
6 |
torch
|
7 |
-
python-dotenv
|
8 |
huggingface_hub
|
9 |
-
|
10 |
-
|
11 |
-
yt-dlp
|
12 |
-
google-generativeai
|
13 |
-
python-chess
|
14 |
-
stockfish
|
15 |
litellm
|
16 |
-
|
17 |
-
|
18 |
-
|
19 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# GAIA Agent - Optimized Requirements for HuggingFace Space
|
2 |
+
# Core framework dependencies (always required)
|
3 |
gradio>=4.0.0
|
4 |
+
python-dotenv
|
5 |
requests>=2.28.0
|
6 |
+
|
7 |
+
# AI/ML core dependencies
|
8 |
smolagents
|
9 |
transformers
|
10 |
torch
|
|
|
11 |
huggingface_hub
|
12 |
+
|
13 |
+
# LLM integration
|
|
|
|
|
|
|
|
|
14 |
litellm
|
15 |
+
|
16 |
+
# Optional but recommended (with graceful fallbacks)
|
17 |
+
google-generativeai # For Gemini Vision and reasoning
|
18 |
+
Pillow # For image processing
|
19 |
+
PyPDF2 # For PDF file processing
|
20 |
+
yt-dlp # For YouTube video processing
|
21 |
+
pandas # For Excel/data processing
|
22 |
+
openpyxl # For Excel (.xlsx) support
|
23 |
+
xlrd # For legacy Excel (.xls) support
|
24 |
+
|
25 |
+
# Chess analysis (optional)
|
26 |
+
python-chess # For chess position analysis
|
27 |
+
stockfish # For chess engine analysis
|
28 |
+
|
29 |
+
# Research tools (optional)
|
30 |
+
pybaseball # For baseball data research
|
@@ -0,0 +1,19 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Full GAIA Agent requirements for HF Space
|
2 |
+
gradio>=4.0.0
|
3 |
+
requests>=2.28.0
|
4 |
+
smolagents
|
5 |
+
transformers
|
6 |
+
torch
|
7 |
+
python-dotenv
|
8 |
+
huggingface_hub
|
9 |
+
Pillow
|
10 |
+
PyPDF2
|
11 |
+
yt-dlp
|
12 |
+
google-generativeai
|
13 |
+
python-chess
|
14 |
+
stockfish
|
15 |
+
litellm
|
16 |
+
pybaseball
|
17 |
+
pandas
|
18 |
+
openpyxl
|
19 |
+
xlrd
|