jmercat's picture
Fix HuggingFace Space configuration - add proper SDK settings and clean requirements
3a9cbd7

A newer version of the Streamlit SDK is available: 1.46.1

Upgrade
metadata
title: OpenThoughts Benchmark Explorer
emoji: πŸ“Š
colorFrom: blue
colorTo: red
sdk: streamlit
sdk_version: 1.28.0
app_file: app.py
pinned: false
license: apache-2.0

OpenThoughts Evalchemy Benchmark Explorer

A comprehensive web application for exploring OpenThoughts benchmark correlations and model performance.

Features

  • Interactive correlation heatmaps
  • Scatter plot explorer with uncertainty analysis
  • Model performance comparisons
  • Statistical summaries and uncertainty analysis

Usage

The app automatically loads benchmark data and provides multiple views for analysis:

  1. Overview Dashboard: High-level summary of benchmarks and correlations
  2. Interactive Heatmap: Correlation matrix visualization
  3. Scatter Explorer: Detailed pairwise benchmark comparisons
  4. Model Performance: Individual model analysis
  5. Statistical Summary: Correlation statistics across methods
  6. Uncertainty Analysis: Measurement reliability analysis

Data Files

The app requires two CSV files:

  • comprehensive_benchmark_scores.csv: Main benchmark scores
  • benchmark_standard_errors.csv: Standard error estimates (optional)

These files should be in the root directory of the repository.