Spaces:

infini-gram-mini
/

Benchmark-Contamination-Monitoring-System

Running

App Files Files Community

Hao Xu commited on May 31

Commit

73b542b

1 Parent(s): 6e404e3

add documentation

Browse files

Files changed (1) hide show

app.py +41 -2

app.py CHANGED Viewed

@@ -224,10 +224,29 @@ def record_submission(benchmark_name, contributor, jsonl_file, hf_path, hf_split
 with gr.Blocks() as interface:
-    gr.Markdown("# 📖 Benchmark Contamination Bulletin")
     with gr.Tabs():
         with gr.Tab(label="Bulletin"):
             source_radio = gr.Radio(
                 choices=["core", "community"],
                 label="Select Benchmark Source",
@@ -253,7 +272,27 @@ with gr.Blocks() as interface:
             )
         with gr.Tab(label="Add New Benchmarks"):
-            gr.Markdown("## Add Your Own Benchmarks for Contamination Checking")
             with gr.Row():
                 benchmark_name_input = gr.Textbox(label="Benchmark Name")

 with gr.Blocks() as interface:
+    gr.HTML(
+            '''<h1 text-align="center">📖 Benchmark Contamination Monitoring System</h1>
+            <p style='font-size: 16px;'>This system monitors potential contamination in benchmark datasets used for evaluating language models across various open-source corpora.</p>
+            <p style='font-size: 16px;'>The system is released along with our paper Infini-gram mini: Exact n-gram Search at the Internet Scale with FM-Index, which documents the methodology and findings in detail.</p>
+            <p style='font-size: 16px;'>We invite the community to contribute by submitting new benchmarks for contamination analysis using the form available in the <b>"Add New Benchmarks"</b> tab.</p>
+            '''
+        )
     with gr.Tabs():
         with gr.Tab(label="Bulletin"):
+            gr.Markdown("## Benchmark Contamination Bulletin")
+            with gr.Accordion(label='Click to view instructions', open=False):
+                gr.Markdown('''
+                The **Benchmark Contamination Bulletin** presents contamination statistics for evaluation benchmarks across different data sources.
+                - Benchmarks analyzed in our accompanying paper are listed under the **core** source.
+                - User-submitted benchmarks appear under the **community** source.
+                - The contamination rate represents the percentage of benchmark entries identified as *dirty* based on our detection criteria.
+                - The bulletin will be updated regularly to include contamination checks on newly released Common Crawl dumps.
+                - You can sort the results by clicking on the column headers.
+                ''')
             source_radio = gr.Radio(
                 choices=["core", "community"],
                 label="Select Benchmark Source",
             )
         with gr.Tab(label="Add New Benchmarks"):
+            gr.Markdown('''
+            ## Add Your Own Benchmarks for Contamination Checking
+            You can use this form to submit a benchmark for contamination checking. Submissions may include either a direct upload or a reference to a publicly available dataset on Hugging Face.
+            ### Submission Guidelines:
+            - **Benchmark Name**: Provide a name for your benchmark.
+            - **Contributor**: Enter your name or affiliation.
+            - **Data Source**:
+                - Upload a `.jsonl` file containing your benchmark entries, or
+                - Specify a Hugging Face dataset path (`author/benchmark-name`) along with the   appropriate split (e.g., `test`, `validation`).
+            - **Field Name**: Indicate the field to analyze for contamination:
+                - For question-answering datasets: use the question field.
+                - For language understanding tasks: use the context or passage field.
+            ### What Happens Next:
+            Once submitted, your benchmark will be queued for analysis. Results will be published in the **community** section of the bulletin.
+            Processing time may vary depending on the dataset format and size. You can check the results by navigating to the **Bulletin** tab and selecting the **community** source, then clicking **Refresh**.
+            ''')
             with gr.Row():
                 benchmark_name_input = gr.Textbox(label="Benchmark Name")