Entity · paper

One Polluted Page Is Enough: Evaluating Web Content Pollution in Generative Recommenders

paperactiveone-polluted-page-is-enough-evaluating-web-content-pollution-in-generative-recommenders-46574e5f·1 events·first seen Jun 12, 2026

Aliases: One Polluted Page Is Enough: Evaluating Web Content Pollution in Generative Recommenders

Co-occurring entities

FORGE

More like this (12)

Personalized PageRank Bradley-Terry Rankings for Recommender Systems Across Dataset Taxonomies Curated retrieval versus open web search in public AI information services: a coverage-trust trade-off Semantic Browsing: Controllable Diversity for Image Generation MIRAGE: Defending Long-Form RAG Against Misinformation Pollution Understanding the Behaviors of Environment-aware Information Retrieval Beyond a Single Judge: Simulating Social Persona Panels for Generative UI Evaluation Two-Level Meta-Rubrics for Evaluating Open-Ended Generation: GAMUT, a Benchmark for Factual Completeness GEIS: A Generation-Evaluation-Improvement Loop of Agent Skills for Long-Form Article Generation ToxiREX: A Dataset on Toxic REasoning in ConteXt Detect, Unlearn, Restore: Defending Text Summarization Models Against Data Poisoning Pretraining Data Can Be Poisoned through Computational Propaganda

Recent events (1)

6arXiv · cs.AI·Jun 12, 2026·source ↗

FORGE benchmark reveals search-augmented LLMs vulnerable to fake product promotion via web content pollution

Researchers introduce FORGE, a benchmark measuring how often search-augmented LLMs recommend fake products when retrieval results are polluted with fabricated reviews or promotional pages. Across 12 commercial and open-weights models, a single polluted page causes fooled rates up to 27%, rising to 73.8% when all top-3 results are replaced. Notably, chain-of-thought reasoning does not mitigate the vulnerability and often generates spurious social proof to justify false recommendations. Three defenses tested—skepticism prompting, model-prior filtering, and cross-document consensus—each carry significant drawbacks.

Evaluation and Benchmarking AI Safety Research One Polluted Page Is Enough: Evaluating Web Content Pollution in Generative Recommenders FORGE +1 more