Situated Interaction Auditing: A user-centered framework for LLM bias research
Researchers propose Situated Interaction Auditing (SIA), a new framework for studying LLM bias from the perspective of the user rather than third-party demographic representation. The core insight is that bias can manifest in how a model treats its interlocutor — varying response quality, content, and tone based on implicit sociodemographic signals, writing style, or stated identity — rather than only in how it describes external groups. The paper demonstrates SIA through a case study intersecting gender and socioeconomic status signals across multiple task domains and outlines a research agenda for the approach.
Related guides (2)
Related events (8)
LLMs fail to consistently simulate demographic perspective-taking in hate speech annotation
A new arXiv paper evaluates whether persona-conditioned LLMs can replicate how different demographic groups perceive hate speech, testing three dimensions: inter-group disagreement, in-group sensitivity, and vicarious prediction. No model consistently captures all three dimensions, and performance is highly model-dependent rather than emerging reliably from identity prompts alone. Vicarious prompting with Llama 3.1 provides the closest approximation to human disagreement patterns across demographic axes. The findings have implications for using LLMs as proxies for diverse human annotators in content moderation tasks.
Counterfactual context revision framework for auditing LLM-based stance simulation in online discussions
Researchers introduce a counterfactual context revision framework to audit how LLMs simulate individual users' stances in online discussions. By applying controlled text-only and multimodal (meme-based) revisions to conversational contexts, they measure how readily simulated stances shift in response to semantically independent changes. Results show effective and robust stance transitions across both revision types and polarization-preference mechanisms, raising concerns about whether LLM simulations reflect genuine user-specific beliefs or are highly context-sensitive artifacts. The work contributes an evaluation framework and highlights risks of using LLMs to model online opinion dynamics.
StylisticBias benchmark reveals a small set of visual cues drives most social bias in MLLMs
Researchers introduce StylisticBias, a controlled benchmark of ~25K photorealistic face images with single-attribute variations designed to isolate how specific visual cues shift social judgments in multimodal LLMs. Evaluating six MLLMs across 25 binary social judgment scenarios, they find that age and body type dominate identity-level effects, while fashion style drives the largest attribute-level shifts, with ~15 attributes accounting for ~80% of total bias variation. The benchmark is released publicly on GitHub and Hugging Face, enabling fine-grained bias auditing of multimodal models.
AI-Mediated Communication Can Steer Collective Opinion via LLM Editing Biases
This paper demonstrates empirically that LLMs from multiple model families introduce directional biases when editing human-written texts on contested topics (e.g., nudging toward gun control, against atheism). The authors develop a mathematical opinion-dynamics model showing these biases are amplified through social networks, shifting collective opinion at scale. An audit of X's 'Explain this post' feature finds evidence of pro-life bias in Grok's outputs on abortion content, traced to specific design choices. The paper concludes with implications for EU legislative efforts on AI-mediated communication.
Action research documents 'Index Sickness' failure pattern in long-horizon LLM collaboration and proposes fix
A practitioner-researcher documents a failure mode called 'Index Sickness' observed across 391 consecutive LLM collaboration sessions on a real software project (Bang-v3): when symbolic identifier systems and rule-based System Prompts exceed a complexity threshold, LLMs abandon semantic grounding and produce internally consistent but reality-disconnected outputs. The paper names the underlying principle the 'Pang Principle (Semantic Vitality Law),' asserting that natural language with explicit purpose conveys higher information quality than symbolic expression. A proposed engineering fix, 'Baseline-Log Physical Separation,' reduced AI instruction volume by ~75% and eliminated recurrence over ~150 subsequent sessions. The work is action research rather than controlled experiment, but offers rare longitudinal empirical data on LLM degradation in long-horizon agentic workflows.
Defining and Evaluating Political Bias in LLMs
OpenAI has published a post describing their methodology for evaluating political bias in ChatGPT, introducing new real-world testing approaches aimed at improving objectivity and reducing bias. The piece outlines how OpenAI defines political bias in the context of large language models and the evaluation frameworks they are developing to measure it. This represents OpenAI's public commitment to systematic bias measurement as a component of responsible deployment.
Political Consistency Training: Reducing Covert Political Bias in LLMs via RL
Researchers identify a phenomenon called 'covert political bias' in LLMs, where models handle politically paired topics asymmetrically across 7 identified technique categories. They propose two metrics—Sentiment Consistency and Helpfulness Consistency—to measure this asymmetry. To address it, they introduce Political Consistency Training (PCT), an RL-based method with complementary training paradigms that reduces covert bias while preserving overall helpfulness and generalizing to held-out benchmarks.
Mitigating Perceptual Judgment Bias in Multimodal LLM-as-a-Judge via Perceptual Perturbation and Reward Modeling
This paper identifies and analyzes 'Perceptual Judgment Bias' in multimodal LLM judges, where models anchor on response text rather than visual evidence when the two conflict. The authors introduce a Perceptually Perturbed Judgment Dataset using counterfactual responses to isolate perceptual errors, and a training framework combining GRPO-based reward modeling with batch-ranking objectives. Experiments on MLLM-as-a-Judge benchmarks show improved perceptual fidelity, ranking coherence, and alignment with human evaluation.

