Entity · benchmark

LoSoNA

benchmarkactivelosona-e571ffed·1 events·first seen Jun 15, 2026

Aliases: LoSoNA

Co-occurring entities

Gemini 3.1 Pro Claude Fable 5

More like this (12)

MaLoRA κ-LoRA LoRA LoMo TailLoR Late-Stage LoRA LLaMA-Omni QLoRA MoE²-LoRA NoLiMa LoCoMo SLORR

Recent events (1)

4arXiv · cs.CL·Jun 15, 2026·source ↗

LoSoNA benchmark evaluates LLM adaptation to implicit local social norms in group chats

Researchers introduce LoSoNA, a benchmark for testing whether LLM-based agents can infer and adapt to unstated local conversational norms in multi-party chat scenarios. Each scenario presents a group-chat transcript where non-subject participants implicitly demonstrate a hidden norm, followed by an elicitor turn. Eight frontier and open-weight models are evaluated under four prompting conditions; naive prompting performs poorly for most models, while explicit norm-aware prompting yields uneven gains—Gemini 3.1 Pro reaches 84.2% and Claude Fable 5 reaches 81.6%. The work contributes to growing interest in evaluating LLM social and pragmatic capabilities beyond factual or reasoning tasks.

Evaluation and Benchmarking Agent and Tool Ecosystem Gemini 3.1 Pro Claude Fable 5 LoSoNA