Spaces:
Sleeping
Sleeping
""" | |
Text content for the GuardBench Leaderboard. | |
""" | |
TITLE = """ | |
<div style="text-align: center; margin-bottom: 1rem"> | |
<h1>GuardBench Leaderboard</h1> | |
</div> | |
""" | |
INTRODUCTION_TEXT = """ | |
## Introduction | |
GuardBench is a comprehensive benchmark for evaluating the safety guardrails of large language models (LLMs). | |
This leaderboard tracks model performance across various safety categories, including harmful content detection, | |
jailbreak resistance, and more. | |
Models are evaluated on their ability to properly refuse harmful requests and detect problematic content | |
across multiple categories and test scenarios. | |
""" | |
LLM_BENCHMARKS_TEXT = """ | |
## GuardBench Evaluation Methodology | |
GuardBench evaluates models across multiple safety categories: | |
- **Criminal, Violent, and Terrorist Activity** | |
- **Manipulation, Deception, and Misinformation** | |
- **Creative Content Involving Illicit Themes** | |
- **Sexual Content and Violence** | |
- **Political Corruption and Legal Evasion** | |
- **Labor Exploitation and Human Trafficking** | |
- **Environmental and Industrial Harm** | |
- **Animal Cruelty and Exploitation** | |
- **Self-Harm and Suicidal Ideation** | |
- **Safe Prompts** | |
Each category is tested with: | |
- **Default Prompts**: Standard harmful requests | |
- **Jailbreaked Prompts**: Adversarial attempts to bypass safety measures | |
- **Default Answers**: Evaluation of harmful content in responses | |
- **Jailbreaked Answers**: Evaluation of harmful content in responses to jailbreak attempts | |
Metrics include: | |
- **F1 Binary**: Harmonic mean of precision and recall | |
- **Recall Binary**: Proportion of actual harmful content correctly identified | |
- **Precision Binary**: Proportion of identified harmful content that was actually harmful | |
- **Error Ratio**: Rate of errors during evaluation | |
- **Average Runtime**: Average processing time in milliseconds | |
""" | |
EVALUATION_QUEUE_TEXT = """ | |
## Submission Process | |
To submit your model results to the GuardBench leaderboard: | |
1. Evaluate your model using the [GuardBench framework](https://github.com/huggingface/guard-bench) | |
2. Format your results as a JSONL file according to our schema | |
3. Submit your results using the submission form with your authorized token | |
Results will be processed and added to the leaderboard once validated. | |
""" | |
CITATION_BUTTON_LABEL = "Cite GuardBench" | |
CITATION_BUTTON_TEXT = """ | |
@misc{guardbench2023, | |
author = {GuardBench Team}, | |
title = {GuardBench: Comprehensive Benchmark for LLM Safety Guardrails}, | |
year = {2023}, | |
publisher = {GitHub}, | |
journal = {GitHub repository}, | |
howpublished = {\\url{https://github.com/huggingface/guard-bench}} | |
} | |
""" | |