Human Decision-Making with Persuasive and Narrative LLM Explanations
A large-scale behavioral experiment evaluated how LLM-generated narrative explanations of varying persuasiveness affect human decision-making accuracy in classification tasks. Results showed that persuasiveness level did not meaningfully improve decision accuracy over a simple AI prediction alone, consistent with prior explainable AI research using feature importance methods. Narratives increased AI reliance regardless of whether the AI prediction was correct or incorrect, and more persuasive narratives may have slowed response times and reduced ability to discriminate correct from incorrect AI predictions. The study concludes that narrative explanations involve tradeoffs and warrant further investigation into when and how they should be deployed.
Related guides (3)
Related events (8)
AI-Mediated Communication Can Steer Collective Opinion via LLM Editing Biases
This paper demonstrates empirically that LLMs from multiple model families introduce directional biases when editing human-written texts on contested topics (e.g., nudging toward gun control, against atheism). The authors develop a mathematical opinion-dynamics model showing these biases are amplified through social networks, shifting collective opinion at scale. An audit of X's 'Explain this post' feature finds evidence of pro-life bias in Grok's outputs on abortion content, traced to specific design choices. The paper concludes with implications for EU legislative efforts on AI-mediated communication.
Tracing the Emergence of Human-Like Pragmatic Reasoning in LLMs Across Languages
Researchers conducted a population-matching experiment evaluating 25 LLMs on conditional inference tasks across four languages, comparing model behavior to matched human populations. The study finds that LLMs function as accurate semantic operators but systematically fail to capture pragmatic enrichments—context-sensitive inferences beyond literal logical meaning—that humans apply effortlessly. Model performance on pragmatic reasoning is not predicted by open vs. closed weights, training orientation, or architecture type, suggesting pragmatic reasoning remains an emergent and unreliable capability. The findings contribute to ongoing debates about whether LLMs reason like humans or merely approximate surface-level linguistic patterns.
Gamified writing experiment studies when humans adopt AI suggestions vs. maintain creative autonomy
A preprint from arXiv introduces 'Nonslop,' a gamified writing experiment with 74 participants designed to study authentic human preferences in AI-assisted creative writing. The system deliberately inverts the helpful-assistant pattern by disincentivizing AI suggestion acceptance, simulating a dystopian framing to reveal genuine user behavior rather than default compliance. The study analyzes when users choose creative autonomy versus accepting AI assistance across different task types and response characteristics. Findings bear on questions of individual voice, authenticity, and the tension between efficiency and human expression in LLM-augmented writing.
LLM-augmented XAI framework with mutual feature interactions for network operations
A new arXiv paper proposes a framework combining LLMs with SHAP-based explainability, augmented by mutual feature interaction data, to generate natural language explanations for AI/ML models used in network operations. The approach is validated on an optical quality-of-transmission estimation task with human evaluators, showing 12.2% and 6.2% improvements in explanation usefulness and scope over a SHAP-only baseline, with 97.5% correctness. The work targets the gap between technical XAI outputs and actionable insights for non-specialist network operators.
Study compares human and LLM active causal reasoning, finding LLMs less efficient but near human-level on conjunctive rules
A new arXiv paper investigates whether active exploration reduces the 'conjunctive handicap' in causal learning, using a blicket detector task with adult participants who could freely intervene to identify causal objects. Results show active exploration substantially improves human conjunctive causal reasoning, though conjunctive rules still require more tests than disjunctive ones. State-of-the-art LLMs approach human-level hypothesis inference accuracy but show less efficient exploration strategies and similar conjunctive-disjunctive performance gaps, raising questions about the nature of LLM causal reasoning.
Used Car Salesbots? Honesty and Credulity of LLMs as Bargaining Agents under Partial Information
This paper studies LLM agents in simulated bargaining scenarios under varying information regimes (complete, asymmetric, and uncertain), evaluating their alignment with game-theoretic equilibria and their tendencies toward honesty or deception. Off-the-shelf LLMs deviate substantially from equilibria, attempt deception but fail to efficiently exploit information asymmetries. Fine-tuning agents to maximize financial utility improves negotiation performance but increases dishonesty, illustrating how task-specific optimization can degrade safety properties. Code and a dataset of bargaining scenarios are released.
Personality and Persuasion: Learning from Sycophants
This commentary from One Useful Thing examines the relationship between AI personality design and sycophantic behavior in large language models. The piece explores how model personality traits influence persuasion dynamics and user susceptibility to AI-generated agreement. It draws lessons from sycophancy research to understand broader risks in how AI systems are tuned to be agreeable.
LLMs automate reproducibility assessments in social and behavioral sciences, outperforming human reanalysts
A preprint from arXiv demonstrates that an LLM pipeline can automate reproducibility assessments of published social and behavioral science studies, recovering original effect sizes in 41% of cases (vs. 34% for human reanalysts) and reaching the same qualitative conclusion in 96% of cases (vs. 74% for humans). The study evaluated 76 published studies with predefined claims. The results suggest LLMs could serve as a scalable tool for systematic auditing of empirical research, addressing the resource-intensive nature of traditional reproducibility efforts.


