Class Attribute Inference Attacks: Inferring Sensitive Class Information by Diffusion-Based Attribute Manipulations Paper β’ 2303.09289 β’ Published Mar 16, 2023 β’ 1
MultiFusion: Fusing Pre-Trained Models for Multi-Lingual, Multi-Modal Image Generation Paper β’ 2305.15296 β’ Published May 24, 2023
Mitigating Inappropriateness in Image Generation: Can there be Value in Reflecting the World's Ugliness? Paper β’ 2305.18398 β’ Published May 28, 2023 β’ 1
Interactively Providing Explanations for Transformer Language Models Paper β’ 2110.02058 β’ Published Sep 2, 2021
Multilingual Text-to-Image Generation Magnifies Gender Stereotypes and Prompt Engineering May Not Help You Paper β’ 2401.16092 β’ Published Jan 29, 2024
A Typology for Exploring the Mitigation of Shortcut Behavior Paper β’ 2203.03668 β’ Published Mar 4, 2022
Aurora-M: The First Open Source Multilingual Language Model Red-teamed according to the U.S. Executive Order Paper β’ 2404.00399 β’ Published Mar 30, 2024 β’ 43
ALERT: A Comprehensive Benchmark for Assessing Large Language Models' Safety through Red Teaming Paper β’ 2404.08676 β’ Published Apr 6, 2024 β’ 3
Exploiting Cultural Biases via Homoglyphs in Text-to-Image Synthesis Paper β’ 2209.08891 β’ Published Sep 19, 2022 β’ 1
Revision Transformers: Instructing Language Models to Change their Values Paper β’ 2210.10332 β’ Published Oct 19, 2022
The Stable Artist: Steering Semantics in Diffusion Latent Space Paper β’ 2212.06013 β’ Published Dec 12, 2022
LLavaGuard: VLM-based Safeguards for Vision Dataset Curation and Safety Assessment Paper β’ 2406.05113 β’ Published Jun 7, 2024 β’ 2
SCAR: Sparse Conditioned Autoencoders for Concept Detection and Steering in LLMs Paper β’ 2411.07122 β’ Published Nov 11, 2024 β’ 1
AILuminate: Introducing v1.0 of the AI Risk and Reliability Benchmark from MLCommons Paper β’ 2503.05731 β’ Published Feb 19 β’ 1
EmoNet-Face: An Expert-Annotated Benchmark for Synthetic Emotion Recognition Paper β’ 2505.20033 β’ Published about 1 month ago β’ 3
EmoNet-Voice: A Fine-Grained, Expert-Verified Benchmark for Speech Emotion Detection Paper β’ 2506.09827 β’ Published 15 days ago β’ 17
MSTS: A Multimodal Safety Test Suite for Vision-Language Models Paper β’ 2501.10057 β’ Published Jan 17 β’ 9
LLMs Lost in Translation: M-ALERT uncovers Cross-Linguistic Safety Gaps Paper β’ 2412.15035 β’ Published Dec 19, 2024 β’ 4
LEDITS++: Limitless Image Editing using Text-to-Image Models Paper β’ 2311.16711 β’ Published Nov 28, 2023 β’ 24