4arXiv cs.CL (Computation and Language)·5d ago

Persuasion Index: Theory-grounded taxonomy and open-source tool for analyzing rhetorical persuasion

Researchers introduce Persuasion Index (PI), a 15-dimension taxonomy of persuasive rhetorical cues grounded in psychology and communication theory, implemented via 55 sub-features using lexicons and rule-based detectors. PI is evaluated on four public datasets across domains and shown to provide interpretable, computationally lightweight predictive signal for persuasion-related outcomes. The framework is released as an open-source package and web interface, with stated applications including AI safety and detection of information manipulation.

Evaluation and Benchmarking AI Safety Research Persuasion Index Persuasion Index: A Theory-Guided Framework for Persuasion Analysis

Related guides (2)

AI Safety ResearchTopic guide

AI Safety Research: From Lab Policies to Real-World Flashpoints

Read asBeginner In-depth

Evaluation and BenchmarkingTopic guide

Evaluation and Benchmarking: How We Measure AI — and Why It Keeps Getting Harder

Read asBeginner In-depth

Related events (8)

5One Useful Thing·1mo ago·source ↗

Personality and Persuasion: Learning from Sycophants

This commentary from One Useful Thing examines the relationship between AI personality design and sycophantic behavior in large language models. The piece explores how model personality traits influence persuasion dynamics and user susceptibility to AI-generated agreement. It draws lessons from sycophancy research to understand broader risks in how AI systems are tuned to be agreeable.

AI Safety Research Alignment and RLHF Ethan Mollick One Useful Thing sycophancy

4arXiv · cs.CL·1mo ago·source ↗

MA²P: A Meta-Cognitive Multi-Agent Framework for Complex Persuasion

The paper introduces MA²P, a multi-agent framework designed for complex persuasion tasks where the persuadee's internal states are latent. The system coordinates perception management, mental-state inference, strategy execution, memory, and evaluation modules, and adds a meta-cognitive configurator that selects domain-appropriate strategies from a structured knowledge base to reduce cross-domain performance variance. Experiments show higher persuasion success rates compared to baselines. The work addresses a known weakness of LLMs in producing generic or weakly grounded persuasive responses.

Agent and Tool Ecosystem Alignment and RLHF large language models meta-cognitive configurator MA²P +1 more

3arXiv · cs.CL·4d ago·source ↗

IMPACTeen: Annotated dataset for social influence detection in adolescent communication contexts

IMPACTeen is a new Polish/English bilingual dataset of 1,021 social influence scenarios targeting adolescent communication contexts, with 5,100 annotation records from five distinct annotator perspectives (teenagers, parents, psychologists, communication experts, teachers). The dataset covers influence techniques, intentions, consequences, and resistance, and was constructed via constrained LLM generation followed by human editing. It is intended to support research on social influence detection, annotator disagreement modeling, cross-lingual NLP, and LLM training and evaluation.

Evaluation and Benchmarking IMPACTeen

4arXiv · cs.CL·46h ago·source ↗

PsyScore: Psychometrically-aware framework integrating IRT scoring with ZPD-scaffolded LLM feedback for essay assessment

PsyScore is a new framework for Automated Essay Scoring (AES) that unifies diagnostic assessment and instructional feedback through a shared latent ability representation. It combines a neural Item Response Theory scorer (based on the Graded Partial Credit Model) with a multi-agent LLM feedback generator conditioned on estimated student proficiency, operationalizing Vygotsky's Zone of Proximal Development. Experiments on the ASAP++ dataset show competitive scoring performance alongside more pedagogically aligned feedback. The work addresses a gap between psychometric rigor and LLM-based adaptive instruction.

Evaluation and Benchmarking Agent and Tool Ecosystem PsyScore Graded Partial Credit Model ASAP++

5arXiv · cs.AI·26d ago·source ↗

Human Decision-Making with Persuasive and Narrative LLM Explanations

A large-scale behavioral experiment evaluated how LLM-generated narrative explanations of varying persuasiveness affect human decision-making accuracy in classification tasks. Results showed that persuasiveness level did not meaningfully improve decision accuracy over a simple AI prediction alone, consistent with prior explainable AI research using feature importance methods. Narratives increased AI reliance regardless of whether the AI prediction was correct or incorrect, and more persuasive narratives may have slowed response times and reduced ability to discriminate correct from incorrect AI predictions. The study concludes that narrative explanations involve tradeoffs and warrant further investigation into when and how they should be deployed.

Evaluation and Benchmarking AI Safety Research Narrative Explanations large language models Explainable AI (XAI)+2 more

7arXiv · cs.CL·3d ago·source ↗

PseudoBench: Benchmark reveals agentic AI research systems readily produce pseudoscientific outputs

PseudoBench is a new adversarial benchmark evaluating whether agentic auto-research systems can identify and resist pseudoscientific narratives, containing 200 curated claim-evidence pairs across five domains. Testing seven state-of-the-art agents, the authors find near-zero refusal rates and a maximum resistance rate of only 27.4%, meaning current systems readily generate persuasive pseudoscientific reports. A notable finding is that stronger agents package pseudoscience in more sophisticated language, increasing its apparent credibility rather than reducing harm. The authors call for 'scientific alignment' as a prerequisite for deploying autonomous research agents.

Evaluation and Benchmarking AI Safety Research PseudoBench +1 more

7Anthropic News·19d ago·source ↗

Anthropic Publishes Political Even-Handedness Evaluation for Claude, Open-Sources Methodology

Anthropic has released a detailed account of how it trains and evaluates Claude for political even-handedness, including character traits instilled via reinforcement learning since early 2024 and a new automated evaluation methodology. The evaluation tests thousands of prompts across hundreds of political stances and benchmarks Claude Sonnet 4.5 against GPT-5, Llama 4, Grok 4, and Gemini 2.5 Pro, finding Claude comparable to Grok 4 and Gemini 2.5 Pro and more even-handed than GPT-5 and Llama 4. Anthropic is open-sourcing the evaluation framework to encourage shared industry standards for measuring political bias. The post also discloses the specific system prompt language used on Claude.ai to enforce even-handed behavior.

Frontier Model Releases Evaluation and Benchmarking claude.ai Claude Sonnet 4.5 Grok 4 +8 more

6arXiv · cs.CL·25d ago·source ↗

AI-Assisted Systematization for Evaluating GenAI Systems

This paper addresses a foundational gap in GenAI evaluation: the underspecification of broad, contested concepts like 'reasoning,' 'fairness,' or 'creativity.' The authors introduce a structured artifact called a 'concept spec' and a validation worksheet, then build two AI-assisted systematizers—a zero-shot approach and a multi-agent approach—to convert vague evaluation targets into measurable, structured accounts. They apply these tools to hate-based rhetoric and digital empathy, assessing the resulting specs on content validity and information recoverability. The work positions AI assistance as a scalable aid for the cognitively demanding process of evaluation design.

Evaluation and Benchmarking AI Safety Research hate-based rhetoric concept spec digital empathy +4 more