AI-Mediated Communication Can Steer Collective Opinion via LLM Editing Biases
This paper demonstrates empirically that LLMs from multiple model families introduce directional biases when editing human-written texts on contested topics (e.g., nudging toward gun control, against atheism). The authors develop a mathematical opinion-dynamics model showing these biases are amplified through social networks, shifting collective opinion at scale. An audit of X's 'Explain this post' feature finds evidence of pro-life bias in Grok's outputs on abortion content, traced to specific design choices. The paper concludes with implications for EU legislative efforts on AI-mediated communication.
Related guides (3)
Related events (8)
LLMs fail to consistently simulate demographic perspective-taking in hate speech annotation
A new arXiv paper evaluates whether persona-conditioned LLMs can replicate how different demographic groups perceive hate speech, testing three dimensions: inter-group disagreement, in-group sensitivity, and vicarious prediction. No model consistently captures all three dimensions, and performance is highly model-dependent rather than emerging reliably from identity prompts alone. Vicarious prompting with Llama 3.1 provides the closest approximation to human disagreement patterns across demographic axes. The findings have implications for using LLMs as proxies for diverse human annotators in content moderation tasks.
AMEL: Accumulated Message Effects Bias LLM Judgments in Multi-Turn Evaluation Pipelines
This paper introduces AMEL (Accumulated Message Effect on LLM Judgments), documenting that prior conversation history with predominantly positive or negative evaluations systematically biases subsequent LLM judgments toward the prevailing polarity. Across 75,898 API calls to 11 models from 4 providers, the effect is statistically robust (d = -0.17, p < 10^-46), concentrates on high-uncertainty items, and shows a negativity asymmetry where negative histories induce 1.62x more bias than positive ones. Critically, the bias does not grow with context length, scaling reduces but does not eliminate it, and the simplest mitigation is using a fresh context per evaluation item.
Defining and Evaluating Political Bias in LLMs
OpenAI has published a post describing their methodology for evaluating political bias in ChatGPT, introducing new real-world testing approaches aimed at improving objectivity and reducing bias. The piece outlines how OpenAI defines political bias in the context of large language models and the evaluation frameworks they are developing to measure it. This represents OpenAI's public commitment to systematic bias measurement as a component of responsible deployment.
Counterfactual context revision framework for auditing LLM-based stance simulation in online discussions
Researchers introduce a counterfactual context revision framework to audit how LLMs simulate individual users' stances in online discussions. By applying controlled text-only and multimodal (meme-based) revisions to conversational contexts, they measure how readily simulated stances shift in response to semantically independent changes. Results show effective and robust stance transitions across both revision types and polarization-preference mechanisms, raising concerns about whether LLM simulations reflect genuine user-specific beliefs or are highly context-sensitive artifacts. The work contributes an evaluation framework and highlights risks of using LLMs to model online opinion dynamics.
Human Decision-Making with Persuasive and Narrative LLM Explanations
A large-scale behavioral experiment evaluated how LLM-generated narrative explanations of varying persuasiveness affect human decision-making accuracy in classification tasks. Results showed that persuasiveness level did not meaningfully improve decision accuracy over a simple AI prediction alone, consistent with prior explainable AI research using feature importance methods. Narratives increased AI reliance regardless of whether the AI prediction was correct or incorrect, and more persuasive narratives may have slowed response times and reduced ability to discriminate correct from incorrect AI predictions. The study concludes that narrative explanations involve tradeoffs and warrant further investigation into when and how they should be deployed.
Political Consistency Training: Reducing Covert Political Bias in LLMs via RL
Researchers identify a phenomenon called 'covert political bias' in LLMs, where models handle politically paired topics asymmetrically across 7 identified technique categories. They propose two metrics—Sentiment Consistency and Helpfulness Consistency—to measure this asymmetry. To address it, they introduce Political Consistency Training (PCT), an RL-based method with complementary training paradigms that reduces covert bias while preserving overall helpfulness and generalizing to held-out benchmarks.
Situated Interaction Auditing: A user-centered framework for LLM bias research
Researchers propose Situated Interaction Auditing (SIA), a new framework for studying LLM bias from the perspective of the user rather than third-party demographic representation. The core insight is that bias can manifest in how a model treats its interlocutor — varying response quality, content, and tone based on implicit sociodemographic signals, writing style, or stated identity — rather than only in how it describes external groups. The paper demonstrates SIA through a case study intersecting gender and socioeconomic status signals across multiple task domains and outlines a research agenda for the approach.
Contagion Networks: formal framework for measuring evaluator bias propagation in multi-agent LLM systems
A new arXiv preprint introduces Contagion Networks, a formal framework for quantifying how systematic evaluation biases spread across interacting LLM agents in multi-agent systems. Using a controlled 3-agent experiment with DeepSeek-chat, the authors measure a Cross-Agent Contagion Matrix and find that homogeneous-model agents produce contagion coefficients 3-5x weaker than cross-model settings. A key practical finding is that increasing evaluator committee size from k=1 to k=3 reduces effective contagion by 72.4%, offering a concrete mitigation strategy. The authors release an open-source experimental framework alongside the paper.


