Running in CIRCLE? A Simple Benchmark for LLM Code Interpreter Security Paper โข 2507.19399 โข Published 12 days ago โข 1
LionGuard 2: Building Lightweight, Data-Efficient & Localised Multilingual Content Moderators Paper โข 2507.15339 โข Published 16 days ago
Toxicity-Aware Few-Shot Prompting for Low-Resource Singlish Translation Paper โข 2507.11966 โข Published 21 days ago
Measuring What Matters: A Framework for Evaluating Safety Risks in Real-World LLM Applications Paper โข 2507.09820 โข Published 23 days ago
RabakBench: Scaling Human Annotations to Construct Localized Multilingual Safety Benchmarks for Low-Resource Languages Paper โข 2507.05980 โข Published 29 days ago โข 1
MinorBench: A hand-built benchmark for content-based risks for children Paper โข 2503.10242 โข Published Mar 13 โข 5
A Flexible Large Language Models Guardrail Development Methodology Applied to Off-Topic Prompt Detection Paper โข 2411.12946 โข Published Nov 20, 2024 โข 23