Spaces:
Running
Running
File size: 2,337 Bytes
d4d998a 346c3c5 d4d998a 346c3c5 d4d998a 346c3c5 d4d998a 346c3c5 d4d998a 5a2e143 346c3c5 5a2e143 1148074 5a2e143 1148074 d847be7 346c3c5 d4d998a a17bcda d4d998a 346c3c5 d4d998a 346c3c5 3a2adee d4d998a 346c3c5 00fe44f d4d998a 346c3c5 d4d998a 346c3c5 2daa8e2 d4d998a 346c3c5 d4d998a |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 |
"""
Text content for the CodeReview Bench Leaderboard.
"""
TITLE = """
<div style="text-align: center; margin-bottom: 1rem">
<h1>CodeReview Bench Leaderboard</h1>
</div>
"""
INTRODUCTION_TEXT = """
## Introduction
CodeReview Bench is a comprehensive benchmark for evaluating the quality and effectiveness of automated code review systems.
This leaderboard tracks model performance across various programming languages and review criteria,
including readability, relevance, explanation clarity, and actionability.
Models are evaluated on their ability to provide high-quality code reviews that are helpful,
accurate, and actionable across multiple programming languages and review categories.
"""
LLM_BENCHMARKS_TEXT = """
CodeReview Bench is a comprehensive benchmark for evaluating automated code review systems across programming languages and review quality dimensions.
It evaluates models on their ability to provide high-quality code reviews using both LLM-based multimetric evaluation (readability, relevance, explanation clarity, problem identification, actionability, completeness, specificity, contextual adequacy, consistency, brevity) and exact-match metrics (pass@1, pass@5, pass@10) presented in our paper.
The benchmark supports both Russian and English comment languages across 4 programming languages including Python, Java, Go, Scala
"""
EVALUATION_QUEUE_TEXT = """
## Submit Your Model
To add your model to the CodeReview Bench leaderboard:
1. Run your evaluation using the CodeReview Bench framework
2. Upload your results in .jsonl format using this form.
3. Once validated, your model will appear on the leaderboard.
### Requirements:
- Results must include all required metrics: LLM-based multimetric scores and exact-match metrics
- Submissions should cover multiple programming languages where applicable
- Both Russian and English comment languages are supported
### ✉️✨ Ready? Upload your results below!
"""
CITATION_BUTTON_LABEL = "Cite CodeReview Bench"
CITATION_BUTTON_TEXT = """
@misc{codereviewbench2025,
author = {CodeReview Bench Team},
title = {CodeReview Bench: Comprehensive Benchmark for Automated Code Review Systems},
year = {2025},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\\url{https://github.com/your-org/codereview-bench}}
}
"""
|