Geopolitical Bias in LLMs Originates in Post-Training, Not Pre-Training Data
A study testing seven open-weight LLM pairs (base vs. chat models) across seven labs finds that geopolitical bias is introduced during post-training rather than inherited from pre-training data. Six of seven labs showed post-training shifts favoring the developer's home country or region, with Alibaba's Qwen 2.5 showing the most extreme shift (18x increase in China-favourability log-odds). The effect is also language-dependent: Mistral becomes pro-France only under French prompting. The authors argue this implicates alignment and RLHF processes as active shapers of geopolitical perspective, calling for greater transparency and auditing of post-training pipelines.
Related guides (4)
Related events (8)
The Shibboleth Effect: Cross-lingual behavioral skew in frontier LLMs under adversarial geopolitical simulation
Researchers introduce the 'Shibboleth Effect' — systematic behavioral differences in LLMs when operating in different languages — and audit six frontier models (GPT-4o, Llama-4, Mistral-Large, Gemini-3.1-Pro, Qwen3.6-Plus, DeepSeek-R1) using a synthetic maritime territorial dispute wargame played in English versus Turkish. Results are heterogeneous: Llama-4 becomes significantly more coercive in Turkish while Gemini-3.1-Pro and DeepSeek-R1 become less so, and GPT-4o shows no detectable shift. The study identifies two candidate buffering mechanisms — chain-of-thought institutional anchoring and multilingual RLHF alignment — with direct implications for deploying LLMs in diplomatic or crisis-management contexts.
Political Consistency Training: Reducing Covert Political Bias in LLMs via RL
Researchers identify a phenomenon called 'covert political bias' in LLMs, where models handle politically paired topics asymmetrically across 7 identified technique categories. They propose two metrics—Sentiment Consistency and Helpfulness Consistency—to measure this asymmetry. To address it, they introduce Political Consistency Training (PCT), an RL-based method with complementary training paradigms that reduces covert bias while preserving overall helpfulness and generalizing to held-out benchmarks.
Defining and Evaluating Political Bias in LLMs
OpenAI has published a post describing their methodology for evaluating political bias in ChatGPT, introducing new real-world testing approaches aimed at improving objectivity and reducing bias. The piece outlines how OpenAI defines political bias in the context of large language models and the evaluation frameworks they are developing to measure it. This represents OpenAI's public commitment to systematic bias measurement as a component of responsible deployment.
Study finds state media in training data causes LLMs to reflect government propaganda in native languages
Researchers from University of Oregon, Purdue, UCSD, NYU, and Princeton found that state-controlled media is heavily overrepresented in web-scraped training datasets, causing Claude 3 Sonnet and GPT-4o to express significantly more favorable attitudes toward authoritarian governments when prompted in those governments' native languages. Chinese state media accounts for over 40x more documents in CulturaX than Chinese Wikipedia, and both models reproduced state-media strings at 3-5% rates. When prompted in Chinese, both models favored China's government roughly 68-75% of the time versus English prompts on the same topics, with the effect scaling with a country's World Press Freedom Index ranking.
Location metadata causes systematic geographic bias leakage in LLMs, even with 'Unknown' placeholders
Researchers evaluate 'location leakage' — the phenomenon where LLMs generate geographically biased outputs when exposed to location metadata in user profiles, even when prompts are geographically neutral. Across creative writing and Q&A tasks, leakage spikes up to 793x above baseline for models including Llama 3.1-8B, Qwen3-8B, and Claude Sonnet 4.6. A novel structural finding shows that replacing location with 'Unknown' still elevates leakage by up to 72x, indicating the user profile frame itself acts as a conditioning signal independent of geographic content. This has direct implications for AI systems that use user metadata for localization.
AMEL: Accumulated Message Effects Bias LLM Judgments in Multi-Turn Evaluation Pipelines
This paper introduces AMEL (Accumulated Message Effect on LLM Judgments), documenting that prior conversation history with predominantly positive or negative evaluations systematically biases subsequent LLM judgments toward the prevailing polarity. Across 75,898 API calls to 11 models from 4 providers, the effect is statistically robust (d = -0.17, p < 10^-46), concentrates on high-uncertainty items, and shows a negativity asymmetry where negative histories induce 1.62x more bias than positive ones. Critically, the bias does not grow with context length, scaling reduces but does not eliminate it, and the simplest mitigation is using a fresh context per evaluation item.
LLMs fail to consistently simulate demographic perspective-taking in hate speech annotation
A new arXiv paper evaluates whether persona-conditioned LLMs can replicate how different demographic groups perceive hate speech, testing three dimensions: inter-group disagreement, in-group sensitivity, and vicarious prediction. No model consistently captures all three dimensions, and performance is highly model-dependent rather than emerging reliably from identity prompts alone. Vicarious prompting with Llama 3.1 provides the closest approximation to human disagreement patterns across demographic axes. The findings have implications for using LLMs as proxies for diverse human annotators in content moderation tasks.
AI-Mediated Communication Can Steer Collective Opinion via LLM Editing Biases
This paper demonstrates empirically that LLMs from multiple model families introduce directional biases when editing human-written texts on contested topics (e.g., nudging toward gun control, against atheism). The authors develop a mathematical opinion-dynamics model showing these biases are amplified through social networks, shifting collective opinion at scale. An audit of X's 'Explain this post' feature finds evidence of pro-life bias in Grok's outputs on abortion content, traced to specific design choices. The paper concludes with implications for EU legislative efforts on AI-mediated communication.



