Almanac
paper

Detect, Unlearn, Restore: Defending Text Summarization Models Against Data Poisoning

paperactiveprovisionaldetect-unlearn-restore-defending-text-summarization-models-against-data-poisoning-4fa02a85·1 events·first seen 3d ago

Aliases: Detect, Unlearn, Restore: Defending Text Summarization Models Against Data Poisoning

Co-occurring entities

More like this (12)

Recent events (1)

5arXiv · cs.CL·3d ago·source ↗

Unified defense framework detects and remediates data poisoning in text summarization fine-tuning

A new arXiv preprint introduces a post-hoc defense framework for detecting and recovering from training-time data poisoning in LLMs fine-tuned for abstractive summarization. The framework uses influence-function analysis in white-box settings and behavioral perturbation auditing in black-box settings, achieving 85-92% detection precision across nine architectures and six benchmarks. Gradient-ascent unlearning restores up to 96% of original model behavior with less than 0.6% ROUGE degradation. The authors also introduce novel attacks targeting factual distortion and representational bias that evade conventional evaluation metrics.