technique
Cross Activation Shift Distance
techniqueactiveprovisional
cross-activation-shift-distance-50aad589·1 events·first seen 13d agoAliases: Cross Activation Shift Distance
Co-occurring entities
More like this (12)
Recent events (1)
Backdoor unlearning in LLMs generalizes across unknown triggers via cross-backdoor transfer
Researchers demonstrate that training an LLM to unlearn a single backdoor trigger can suppress other backdoors that were never explicitly targeted, a phenomenon they call cross-backdoor transfer. The study spans three model families with backdoors injected via pretraining or continual pretraining, and introduces a new metric called Cross Activation Shift Distance to quantify the relationship between different unlearning interventions. The finding opens a potential defensive strategy where defenders deliberately inject and then remove controlled backdoors to suppress unknown attacker-planted backdoors.