benchmark
LoSoNA
benchmarkactiveprovisional
losona-e571ffed·1 events·first seen 2d agoAliases: LoSoNA
Co-occurring entities
More like this (12)
Recent events (1)
LoSoNA benchmark evaluates LLM adaptation to implicit local social norms in group chats
Researchers introduce LoSoNA, a benchmark for testing whether LLM-based agents can infer and adapt to unstated local conversational norms in multi-party chat scenarios. Each scenario presents a group-chat transcript where non-subject participants implicitly demonstrate a hidden norm, followed by an elicitor turn. Eight frontier and open-weight models are evaluated under four prompting conditions; naive prompting performs poorly for most models, while explicit norm-aware prompting yields uneven gains—Gemini 3.1 Pro reaches 84.2% and Claude Fable 5 reaches 81.6%. The work contributes to growing interest in evaluating LLM social and pragmatic capabilities beyond factual or reasoning tasks.