paper

When Certainty Is an Artifact: Keyword Lexicon Blindness and the (Mis)Measurement of Rhetorical Stance

paperactiveprovisionalwhen-certainty-is-an-artifact-keyword-lexicon-blindness-and-the-mis-measurement-of-rhetorical-stance-c31ee05c·1 events·first seen 4d ago

Aliases: When Certainty Is an Artifact: Keyword Lexicon Blindness and the (Mis)Measurement of Rhetorical Stance

More like this (12)

Evaluation Awareness Is Not One Capability: Evidence from Open Language Models Revising Context, Shifting Simulated Stance: Auditing LLM-Based Stance Simulation in Online Discussions Expert Blindness Effect Vision-Default, Prior-Override: Causal Mechanisms of Perception-Knowledge Conflict in Vision-Language Models token-wise self-certainty Look Light, Think Heavy: What Multimodal Chain-of-Thought Reasoning Can and Cannot Do Measuring Epistemic Resilience of LLMs Under Misleading Medical Context Does VLA Even Know the Basics? Measuring Commonsense and World Knowledge Retention in Vision-Language-Action Models A Resource for Enthymeme Detection in Controversial Political Discourse Operadic consistency: a label-free signal for compositional reasoning failures in LLMs The Masked Advantage: Uncovering Local-Language Access to Cultural Knowledge in LLMs epistemic markers

Recent events (1)

5arXiv · cs.CL·4d ago·source ↗

LLM-based classification exposes keyword lexicon artifacts in computational social science stance measurement

A new arXiv preprint demonstrates that statistically significant findings in computational social science can be entirely measurement artifacts of keyword-based scoring instruments. Analyzing 85 interviews across four public intellectuals, the authors show that keyword-based certainty scores produce strong correlations (r=0.72–0.93) that collapse or invert when replaced with LLM zero-shot semantic classification on 32,625 sentences. The paper identifies three structural failure modes in keyword lexicons—syntactic blindness, polysemy blindness, and categorical absence—and argues that keyword counts measure lexical co-occurrence tendencies rather than rhetorical stance. The work has implications for the validity of prior NLP-based social science research and for the comparative utility of LLMs as measurement instruments.

Evaluation and Benchmarking When Certainty Is an Artifact: Keyword Lexicon Blindness and the (Mis)Measurement of Rhetorical Stance