paper
Tracing Target Answers in Poisoned Retrieval Corpora via Token Influence Attribution
paperactiveprovisional
tracing-target-answers-in-poisoned-retrieval-corpora-via-token-influence-attribution-0e8e084c·1 events·first seen 3d agoAliases: Tracing Target Answers in Poisoned Retrieval Corpora via Token Influence Attribution
Co-occurring entities
More like this (12)
Expert-Aware Causal Tracing of Factual Recall in Sparse MoE Language ModelsCLP: Collocation-Length Prediction for Zero-Loss Adaptive Multi-Token InferenceHow Surprising Is Historical Italian to Language Models? Tokenization Tax, Comprehension Tax, and a Simple MitigationDetect, Unlearn, Restore: Defending Text Summarization Models Against Data PoisoningRecalling Too Well: Sycophancy Evaluation and Mitigation in Memory-Augmented ModelsHow Does Research Evolve? Tracing Cross-Domain Trajectories in NLP, ML, and CV with Claim-Grounded Typed CitationsAttention Amnesia in Hybrid LLMs: When CoT Fine-Tuning Breaks Long-Range Recall, and How to Fix ItBeyond Tokenization: Direct Timestep Embedding and Contrastive Alignment for Time-Series Question AnsweringFixed-Payload PoisoningPrivacy Inference AttackDifference-Aware Retrieval Policies for Imitation LearningWhen Does Mixing Help? Analyzing Query Embedding Interpolation in Multilingual Dense Retrieval
Recent events (1)
TRACE: Lightweight RAG corpus poisoning detection via token influence attribution
Researchers introduce TRACE, a detection framework for corpus poisoning attacks on Retrieval-Augmented Generation (RAG) systems that works by tracing answer-related tokens through token influence attribution rather than relying on auxiliary classifiers or LLM-based verification. The method identifies recurrent high-influence keywords across retrieved documents and performs secondary verification to confirm their effect on model predictions. Evaluated on three QA benchmarks and six LLMs, TRACE achieves strong detection performance while also exposing attacker-specified target answers, with lower computational overhead than prior approaches.