A newer version of the Streamlit SDK is available:
1.46.1
metadata
title: OpenThoughts Benchmark Explorer
emoji: π
colorFrom: blue
colorTo: red
sdk: streamlit
sdk_version: 1.28.0
app_file: app.py
pinned: false
license: apache-2.0
OpenThoughts Evalchemy Benchmark Explorer
A comprehensive web application for exploring OpenThoughts benchmark correlations and model performance.
Features
- Interactive correlation heatmaps
- Scatter plot explorer with uncertainty analysis
- Model performance comparisons
- Statistical summaries and uncertainty analysis
Usage
The app automatically loads benchmark data and provides multiple views for analysis:
- Overview Dashboard: High-level summary of benchmarks and correlations
- Interactive Heatmap: Correlation matrix visualization
- Scatter Explorer: Detailed pairwise benchmark comparisons
- Model Performance: Individual model analysis
- Statistical Summary: Correlation statistics across methods
- Uncertainty Analysis: Measurement reliability analysis
Data Files
The app requires two CSV files:
comprehensive_benchmark_scores.csv
: Main benchmark scoresbenchmark_standard_errors.csv
: Standard error estimates (optional)
These files should be in the root directory of the repository.