Triggers Hijack Language Circuits: A Mechanistic Analysis of Backdoor Behaviors in Large Language Models
Paper • 2602.10382 • Published • 2
Contributors who are invited to beta-test our next big feature! Contact us if you want to join this team :-)