paper
Detect, Unlearn, Restore: Defending Text Summarization Models Against Data Poisoning
paperactiveprovisional
detect-unlearn-restore-defending-text-summarization-models-against-data-poisoning-4fa02a85·1 events·first seen 3d agoAliases: Detect, Unlearn, Restore: Defending Text Summarization Models Against Data Poisoning
Co-occurring entities
More like this (12)
Less is More: Quality-Aware Training Data Selection for Scientific Summarizationclinical text summarizationLearning to Summarize with Human FeedbackA Training-Free Mixture-of-Agents Framework for Multi-Document Summarization using LLMs and Knowledge GraphsUncertainty-based Debiasing and Unlearning for DecontaminationTracing Target Answers in Poisoned Retrieval Corpora via Token Influence AttributionNAMESAKES: Probing Identity Memorization in Text-to-Image ModelsProvenance-Grounded Gating and Adaptive Recovery in Synthetic Post-Training Data CurationLanguage Model Safety MonitorRecursive SummarizationDecomposing Factual Sycophancy in Language Models: How Size and Instruction Tuning Shape RobustnessRecalling Too Well: Sycophancy Evaluation and Mitigation in Memory-Augmented Models
Recent events (1)
Unified defense framework detects and remediates data poisoning in text summarization fine-tuning
A new arXiv preprint introduces a post-hoc defense framework for detecting and recovering from training-time data poisoning in LLMs fine-tuned for abstractive summarization. The framework uses influence-function analysis in white-box settings and behavioral perturbation auditing in black-box settings, achieving 85-92% detection precision across nine architectures and six benchmarks. Gradient-ascent unlearning restores up to 96% of original model behavior with less than 0.6% ROUGE degradation. The authors also introduce novel attacks targeting factual distortion and representational bias that evade conventional evaluation metrics.