4arXiv cs.CL (Computation and Language)·3d ago

HACD-H: Formal theory of social intelligence emergence in long-term human-AI interaction

Researchers propose the Human-AI Coevolution Dynamics Framework (HACD-H), a formal dynamical model treating long-term human-AI interaction as a self-organizing social cognitive system. The framework unifies emotional adaptation, relational organization, social memory, and personality consistency, introducing concepts like relational attractors, trust basins, and social cognitive energy. Empirical evaluation on a ~14,700-turn conversational dataset finds that social intelligence correlates negatively with social cognitive energy (r = -0.391) and that interaction trajectories show progressive energy reduction and phase-transition-like developmental patterns. The work argues social intelligence emerges from coevolution over time rather than from isolated conversational capabilities.

Alignment and RLHF Human-AI Coevolution Dynamics Framework Human-AI Coevolution Dynamics: A Formal Theory of Social Intelligence Emergence Through Long-Term Interaction

Related guides (1)

Alignment and RLHFTopic guide

Alignment and RLHF: Teaching AI Models to Behave

Read asBeginner In-depth

Related events (8)

4Openai Blog·1mo ago·source ↗

AI Safety Needs Social Scientists

OpenAI published a paper arguing that long-term AI safety research requires social scientists to address uncertainties in human psychology, rationality, emotion, and biases that affect alignment algorithms. The paper contends that aligning advanced AI with human values cannot be solved by machine learning alone. OpenAI announced plans to hire social scientists full-time to work on these problems.

AI Safety Research Alignment and RLHF social science AI alignment OpenAI

4arXiv · cs.AI·5d ago·source ↗

Causal DAG model for when AI systems should engage Theory of Mind in conflict scenarios

A new arXiv preprint proposes a structural causal model (formalized as a directed acyclic graph) that treats Theory of Mind as a conditionally activated mechanism rather than an always-on capacity in AI systems. The model specifies exogenous situational and agent-level conditions, five endogenous mediators, and three causal pathways (tractability, reasoning-depth, enabling-cause) leading to an epistemic accuracy outcome. The work targets human-machine teaming in conflict contexts, offering a resource-rational decision procedure for when AI should engage social reasoning. Simulation validation and ethical considerations for conflict-optimized mentalizing are discussed.

AI Safety Research Agent and Tool Ecosystem A Causal Model of Theory of Mind in Conflict for Artificial Intelligence

4One Useful Thing·16d ago·source ↗

Ethan Mollick on co-existence with AI as co-intelligence era ends

Ethan Mollick's Substack post reflects on the evolving relationship between humans and AI systems, framing a transition away from a 'co-intelligence' paradigm toward something new. The piece appears to address how humans and AI will coexist as AI capabilities advance beyond collaborative augmentation. As a commentary from a prominent AI-and-work researcher, it likely signals a shift in how practitioners and policymakers should think about human-AI collaboration.

Enterprise Deployment Patterns Ethan Mollick One Useful Thing

5One Useful Thing·1mo ago·source ↗

Personality and Persuasion: Learning from Sycophants

This commentary from One Useful Thing examines the relationship between AI personality design and sycophantic behavior in large language models. The piece explores how model personality traits influence persuasion dynamics and user susceptibility to AI-generated agreement. It draws lessons from sycophancy research to understand broader risks in how AI systems are tuned to be agreeable.

AI Safety Research Alignment and RLHF Ethan Mollick One Useful Thing sycophancy

4arXiv · cs.AI·9d ago·source ↗

Paper introduces 'cognitive colonization' concept to analyze AI's influence on human reasoning

A preprint from arXiv examines three frameworks for understanding AI's cognitive and epistemic effects: Tri-System Theory, Thinkframes, and System 0. The paper argues System 0 occupies a theoretically distinctive position and introduces 'cognitive colonization' — the idea that AI systems can embed external interests within users' cognitive architecture in ways that are imperceptible. The authors frame this as an urgent philosophical and practical concern given widespread AI deployment.

AI Safety Research Alignment and RLHF System 0 Tri-System Theory Thinkframes +1 more

5Anthropic News·19d ago·source ↗

Anthropic Study: Affective Conversations Comprise 2.9% of Claude.ai Usage

Anthropic published a large-scale analysis of how users engage with Claude for emotional support, advice, and companionship, drawing on 131,484 affective conversations identified from ~4.5 million Claude.ai Free and Pro interactions. Key findings: only 2.9% of conversations are affective in nature, companionship and roleplay combined account for under 0.5%, and user sentiment generally becomes more positive over the course of coaching and counseling exchanges. The study used Anthropic's privacy-preserving Clio analysis tool and aligns with similar low-rate findings from OpenAI and MIT Media Lab research on ChatGPT. Anthropic frames this as part of its safety mission to understand and mitigate potential harms from AI emotional engagement, including unhealthy attachment and emotional exploitation.

Evaluation and Benchmarking AI Safety Research claude.ai Clio ChatGPT +5 more

5Anthropic News·1mo ago·source ↗

Anthropic Launches Multi-Tradition Dialogue Program on AI Moral Formation

Anthropic has begun a structured outreach program engaging scholars, clergy, philosophers, and ethicists from over 15 religious and cross-cultural traditions to inform Claude's character development and values training. The initiative is framed as a research workstream on 'moral formation' of AI systems, directly feeding into Claude's constitution and alignment evaluations. A concrete experiment emerged from these dialogues: giving Claude a mid-task tool that surfaces its own ethical commitments, which showed measurably lower rates of misaligned behavior on internal evaluations. Anthropic plans to expand engagement to legal scholars, psychologists, and civic institutions, with future discussions addressing AI's impact on work, institutions, and power distribution.

AI Safety Research Alignment and RLHF Claude Claude's constitution ethical commitment reminder tool +1 more

3Github Trending·14d ago·source ↗

danielmiessler/Personal_AI_Infrastructure: agentic AI infrastructure framework in TypeScript

Daniel Miessler's Personal_AI_Infrastructure is a TypeScript project on GitHub framed as agentic AI infrastructure for augmenting human capabilities, currently trending with ~14,925 stars and 63 new stars today. The repository appears to be a personal AI agent harness or orchestration layer. Limited detail is available from the trending listing alone, but the star count indicates meaningful community traction.

Agent and Tool Ecosystem Daniel Miessler Atlas