dataset
WildChat
datasetactiveprovisional
wildchat-ef48d630·1 events·first seen 7h agoAliases: WildChat
Co-occurring entities
More like this (12)
Recent events (1)
Study of security and privacy prompts in the wild reveals LLM response quality gaps and inconsistency
Researchers analyzed 14,727 security and privacy (S&P) prompts drawn from WildChat's 3.2M real user-LLM conversations, categorizing them into nine topic areas and evaluating response quality across 270 advice-seeking prompts. Commercial models substantially outperformed open-weight models (GPT achieving 98% 'good enough' responses vs. Llama 4 at 47%), but even high-performing commercial models showed inconsistent responses across repeated runs of the same prompt. The study is the first to analyze real user S&P queries to LLMs rather than expert-authored test sets, surfacing both a capability gap and a reliability concern.