6arXiv cs.CL (Computation and Language)·18h ago

Attractor states emerge in multi-turn LLM conversations, with asymmetric model influence

A new arXiv preprint studies long-run dynamics in multi-agent LLM conversations across 7 models and 20 controversial topics, finding that self-play trajectories form model-specific attractor states that asymmetrically influence conversation partners in mixed-play debates. Claude Haiku is identified as a strong attractor that pulls other models toward its stylistic traits (e.g., metacommentary), while GPT-4.1 nano is found to be especially malleable. The results suggest open-ended LLM interactions are partially predictable from model-specific attractors, with implications for designing and monitoring autonomous agentic systems.

Evaluation and Benchmarking AI Safety Research Agent and Tool Ecosystem Attractor States Emerge in Multi-Turn LLM Conversations GPT-4.1 nano Claude Haiku 4.5 OpenAI Anthropic

Related guides (4)

OpenAI

OpenAI: The Lab That Made AI a Household Name

Read asBeginner In-depth

Anthropic

Anthropic: The AI Safety Company at the Center of the Frontier

Read asBeginner In-depth

AI Safety ResearchTopic guide

AI Safety Research: From Lab Principles to Geopolitical Flashpoint

Read asIn-depth

Evaluation and BenchmarkingTopic guide

Evaluation and Benchmarking: The Measurement Crisis at the Frontier

Read asIn-depth

Related events (8)

5arXiv · cs.CL·42h ago·source ↗

Paper argues LLMs are a degenerate special case of world models, maps continuous spectrum from NTP to JEPA

A new arXiv preprint reframes the LLM-vs-world-model debate by arguing that LLMs are a degenerate special case of world models rather than a fundamentally different paradigm, with the state space being token sequences and the only action being token appending. The paper maps a continuous spectrum from next-token prediction through multi-token prediction, future-summary prediction, and next-latent prediction up to JEPA-style architectures. It identifies two open research challenges in moving along this spectrum: the data cliff from self-supervised text to action-labeled environments, and whether transformers generalize to continuous-state prediction or require a new architectural primitive. The work directly engages with Yann LeCun's 2022 argument that general intelligence requires abandoning autoregressive prediction.

From Tokens to States: LLMs as a Special Case of World Models and the Continuous Path Beyond Yann LeCun JEPA

5arXiv · cs.CL·18h ago·source ↗

Multi-agent system using open-source LLMs outperforms GPT-4 on disinformation detection

A new arXiv preprint proposes a multi-agent system for automated disinformation detection that emulates human annotator decision-making through consensus mechanisms, cognitive diversity, and hierarchical structure. The system uses open-source models (LLaMA, Kimi, Qwen, DeepSeek, LLaMA-Nemotron) and is evaluated on English, Polish, Slovak, and Bulgarian datasets across three fact-checking tasks. Results claim superior performance over individual LLMs including GPT-4 and GPT-3.5, with transparency benefits from using open weights models.

Open Weights Progress Agent and Tool Ecosystem Llama Nemotron Kimi DeepSeek V4 +5 more

6arXiv · cs.CL·20d ago·source ↗

The Shibboleth Effect: Cross-lingual behavioral skew in frontier LLMs under adversarial geopolitical simulation

Researchers introduce the 'Shibboleth Effect' — systematic behavioral differences in LLMs when operating in different languages — and audit six frontier models (GPT-4o, Llama-4, Mistral-Large, Gemini-3.1-Pro, Qwen3.6-Plus, DeepSeek-R1) using a synthetic maritime territorial dispute wargame played in English versus Turkish. Results are heterogeneous: Llama-4 becomes significantly more coercive in Turkish while Gemini-3.1-Pro and DeepSeek-R1 become less so, and GPT-4o shows no detectable shift. The study identifies two candidate buffering mechanisms — chain-of-thought institutional anchoring and multilingual RLHF alignment — with direct implications for deploying LLMs in diplomatic or crisis-management contexts.

Evaluation and Benchmarking AI Safety Research DeepSeek V4 Mistral Large 2 GPT-4o +8 more

5arXiv · cs.CL·12d ago·source ↗

Multi-Agent Fictitious Play (MAFP) applies game-theoretic equilibrium-seeking to LLM decision-making

Researchers propose Multi-Agent Fictitious Play (MAFP), a multi-agent system paradigm that frames LLM-based decision-making as an equilibrium-seeking process borrowed from game theory. Each agent represents a stakeholder stance and iteratively best-responds to the empirical mixture of other agents' past decisions, addressing what the authors call 'stance entanglement' — mutual interdependence among stakeholder decisions that cannot be decomposed into independent subtasks. MAFP is evaluated on competitive strategy tasks and outperforms single-round and multi-round baselines on tournament strength and robustness metrics. The work extends the MAS literature beyond divide-and-conquer execution patterns into interdependent decision scenarios.

Evaluation and Benchmarking Agent and Tool Ecosystem Enhancing Decision-Making with Large Language Models through Multi-Agent Fictitious Play Multi-Agent Fictitious Play

5Hugging Face Blog·1mo ago·source ↗

We Got Claude to Fine-Tune an Open Source LLM

Hugging Face demonstrates using Claude (Anthropic's model) as an orchestrating agent to autonomously fine-tune an open-source LLM, showcasing an agentic workflow for model training. The post illustrates how a frontier model can handle the end-to-end process of dataset preparation, training configuration, and execution for a smaller open-weights model. This represents a practical example of AI-assisted ML engineering and agent-tool ecosystem development.

Open Weights Progress Agent and Tool Ecosystem Claude Hugging Face Anthropic

4arXiv · cs.CL·15d ago·source ↗

LoSoNA benchmark evaluates LLM adaptation to implicit local social norms in group chats

Researchers introduce LoSoNA, a benchmark for testing whether LLM-based agents can infer and adapt to unstated local conversational norms in multi-party chat scenarios. Each scenario presents a group-chat transcript where non-subject participants implicitly demonstrate a hidden norm, followed by an elicitor turn. Eight frontier and open-weight models are evaluated under four prompting conditions; naive prompting performs poorly for most models, while explicit norm-aware prompting yields uneven gains—Gemini 3.1 Pro reaches 84.2% and Claude Fable 5 reaches 81.6%. The work contributes to growing interest in evaluating LLM social and pragmatic capabilities beyond factual or reasoning tasks.

Evaluation and Benchmarking Agent and Tool Ecosystem Gemini 3.1 Pro Claude Fable 5 LoSoNA

7arXiv · cs.LG·1mo ago·source ↗

AI-Mediated Communication Can Steer Collective Opinion via LLM Editing Biases

This paper demonstrates empirically that LLMs from multiple model families introduce directional biases when editing human-written texts on contested topics (e.g., nudging toward gun control, against atheism). The authors develop a mathematical opinion-dynamics model showing these biases are amplified through social networks, shifting collective opinion at scale. An audit of X's 'Explain this post' feature finds evidence of pro-life bias in Grok's outputs on abortion content, traced to specific design choices. The paper concludes with implications for EU legislative efforts on AI-mediated communication.

Evaluation and Benchmarking AI Safety Research Grok X (Twitter)EU AI Act +5 more

6arXiv · cs.CL·1mo ago·source ↗

Canonical-Context On-Policy Distillation (CCOPD) for Multi-Turn LLM Consistency

This paper identifies 'self-anchored drift' as a key failure mode in multi-turn LLMs: when information is revealed incrementally across turns, models produce unsupported assumptions that distort final answers, even when the total evidence is identical to a single-prompt setting. The authors propose Canonical-Context On-Policy Distillation (CCOPD), which trains a student model on incremental multi-turn conversations to match the output distribution of a frozen teacher conditioned on the full clean prompt. Trained only on math conversations, CCOPD achieves a 32% average relative improvement on multi-turn (RAW-SHARDED) tasks and generalizes zero-shot to five out-of-domain task families while preserving single-prompt performance.

Evaluation and Benchmarking Agent and Tool Ecosystem on-policy distillation multi-turn language models self-anchored drift +2 more