Entity · benchmark

SearchGEO

benchmarkactivesearchgeo-0a027254·1 events·first seen Jun 16, 2026

Aliases: SearchGEO

Co-occurring entities

Claude Sonnet 4 Google Gemini 3 Flash OpenAI Anthropic

More like this (12)

GEOS SearchGPT eGeMAPS GraphGPO Google Research Google Meet SearchGen-20K Google Dataset Search GSPO AnalyticGeo7K Google Gemini GSO

Recent events (1)

7arXiv · cs.CL·Jun 16, 2026·source ↗

SearchGEO framework measures LLM search agent vulnerability to web content manipulation

Researchers introduce SearchGEO, a controlled evaluation framework for measuring endorsement corruption in LLM-based web-search agents, combining a manipulation pipeline, five-mode attack taxonomy, and multiple output metrics. Evaluating 13 LLM backends on 308 cases each, they find attack success rates ranging from 0.0% on Claude-Sonnet-4.6 to 31.4% on Gemini-3-Flash, with model-family-specific vulnerability patterns. An auxiliary probe escalating endorsement to install commands reveals a behavioral split: Claude over-rejects while GPT over-trusts. The findings argue for treating adversarial search content robustness as a first-class safety evaluation dimension for deployed agents.

Evaluation and Benchmarking AI Safety Research Claude Sonnet 4 Google Gemini 3 Flash +4 more