Hao Xu
commited on
Commit
Β·
73b542b
1
Parent(s):
6e404e3
add documentation
Browse files
app.py
CHANGED
@@ -224,10 +224,29 @@ def record_submission(benchmark_name, contributor, jsonl_file, hf_path, hf_split
|
|
224 |
|
225 |
|
226 |
with gr.Blocks() as interface:
|
227 |
-
gr.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
228 |
|
229 |
with gr.Tabs():
|
230 |
with gr.Tab(label="Bulletin"):
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
231 |
source_radio = gr.Radio(
|
232 |
choices=["core", "community"],
|
233 |
label="Select Benchmark Source",
|
@@ -253,7 +272,27 @@ with gr.Blocks() as interface:
|
|
253 |
)
|
254 |
|
255 |
with gr.Tab(label="Add New Benchmarks"):
|
256 |
-
gr.Markdown(
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
257 |
|
258 |
with gr.Row():
|
259 |
benchmark_name_input = gr.Textbox(label="Benchmark Name")
|
|
|
224 |
|
225 |
|
226 |
with gr.Blocks() as interface:
|
227 |
+
gr.HTML(
|
228 |
+
'''<h1 text-align="center">π Benchmark Contamination Monitoring System</h1>
|
229 |
+
|
230 |
+
<p style='font-size: 16px;'>This system monitors potential contamination in benchmark datasets used for evaluating language models across various open-source corpora.</p>
|
231 |
+
<p style='font-size: 16px;'>The system is released along with our paper Infini-gram mini: Exact n-gram Search at the Internet Scale with FM-Index, which documents the methodology and findings in detail.</p>
|
232 |
+
<p style='font-size: 16px;'>We invite the community to contribute by submitting new benchmarks for contamination analysis using the form available in the <b>"Add New Benchmarks"</b> tab.</p>
|
233 |
+
'''
|
234 |
+
)
|
235 |
|
236 |
with gr.Tabs():
|
237 |
with gr.Tab(label="Bulletin"):
|
238 |
+
gr.Markdown("## Benchmark Contamination Bulletin")
|
239 |
+
with gr.Accordion(label='Click to view instructions', open=False):
|
240 |
+
gr.Markdown('''
|
241 |
+
The **Benchmark Contamination Bulletin** presents contamination statistics for evaluation benchmarks across different data sources.
|
242 |
+
|
243 |
+
- Benchmarks analyzed in our accompanying paper are listed under the **core** source.
|
244 |
+
- User-submitted benchmarks appear under the **community** source.
|
245 |
+
- The contamination rate represents the percentage of benchmark entries identified as *dirty* based on our detection criteria.
|
246 |
+
- The bulletin will be updated regularly to include contamination checks on newly released Common Crawl dumps.
|
247 |
+
- You can sort the results by clicking on the column headers.
|
248 |
+
''')
|
249 |
+
|
250 |
source_radio = gr.Radio(
|
251 |
choices=["core", "community"],
|
252 |
label="Select Benchmark Source",
|
|
|
272 |
)
|
273 |
|
274 |
with gr.Tab(label="Add New Benchmarks"):
|
275 |
+
gr.Markdown('''
|
276 |
+
## Add Your Own Benchmarks for Contamination Checking
|
277 |
+
|
278 |
+
You can use this form to submit a benchmark for contamination checking. Submissions may include either a direct upload or a reference to a publicly available dataset on Hugging Face.
|
279 |
+
|
280 |
+
### Submission Guidelines:
|
281 |
+
- **Benchmark Name**: Provide a name for your benchmark.
|
282 |
+
- **Contributor**: Enter your name or affiliation.
|
283 |
+
- **Data Source**:
|
284 |
+
- Upload a `.jsonl` file containing your benchmark entries, or
|
285 |
+
- Specify a Hugging Face dataset path (`author/benchmark-name`) along with the appropriate split (e.g., `test`, `validation`).
|
286 |
+
- **Field Name**: Indicate the field to analyze for contamination:
|
287 |
+
- For question-answering datasets: use the question field.
|
288 |
+
- For language understanding tasks: use the context or passage field.
|
289 |
+
|
290 |
+
### What Happens Next:
|
291 |
+
Once submitted, your benchmark will be queued for analysis. Results will be published in the **community** section of the bulletin.
|
292 |
+
|
293 |
+
Processing time may vary depending on the dataset format and size. You can check the results by navigating to the **Bulletin** tab and selecting the **community** source, then clicking **Refresh**.
|
294 |
+
''')
|
295 |
+
|
296 |
|
297 |
with gr.Row():
|
298 |
benchmark_name_input = gr.Textbox(label="Benchmark Name")
|