Almanac
paper

Tracing Target Answers in Poisoned Retrieval Corpora via Token Influence Attribution

paperactiveprovisionaltracing-target-answers-in-poisoned-retrieval-corpora-via-token-influence-attribution-0e8e084c·1 events·first seen 3d ago

Aliases: Tracing Target Answers in Poisoned Retrieval Corpora via Token Influence Attribution

Co-occurring entities

More like this (12)

Recent events (1)

5arXiv · cs.CL·3d ago·source ↗

TRACE: Lightweight RAG corpus poisoning detection via token influence attribution

Researchers introduce TRACE, a detection framework for corpus poisoning attacks on Retrieval-Augmented Generation (RAG) systems that works by tracing answer-related tokens through token influence attribution rather than relying on auxiliary classifiers or LLM-based verification. The method identifies recurrent high-influence keywords across retrieved documents and performs secondary verification to confirm their effect on model predictions. Evaluated on three QA benchmarks and six LLMs, TRACE achieves strong detection performance while also exposing attacker-specified target answers, with lower computational overhead than prior approaches.