metadata

title: OpenThoughts Benchmark Explorer
emoji: 📊
colorFrom: blue
colorTo: red
sdk: streamlit
sdk_version: 1.28.0
app_file: app.py
pinned: false
license: apache-2.0

OpenThoughts Evalchemy Benchmark Explorer

A comprehensive web application for exploring OpenThoughts benchmark correlations and model performance.

Features

Interactive correlation heatmaps
Scatter plot explorer with uncertainty analysis
Model performance comparisons
Statistical summaries and uncertainty analysis

Usage

The app automatically loads benchmark data and provides multiple views for analysis:

Overview Dashboard: High-level summary of benchmarks and correlations
Interactive Heatmap: Correlation matrix visualization
Scatter Explorer: Detailed pairwise benchmark comparisons
Model Performance: Individual model analysis
Statistical Summary: Correlation statistics across methods
Uncertainty Analysis: Measurement reliability analysis

Data Files

The app requires two CSV files:

comprehensive_benchmark_scores.csv: Main benchmark scores
benchmark_standard_errors.csv: Standard error estimates (optional)

These files should be in the root directory of the repository.