paper

Apparent Psychological Profiles of Large Language Models are Largely a Measurement Artifact

paperactiveprovisionalapparent-psychological-profiles-of-large-language-models-are-largely-a-measurement-artifact-b6be45ae·1 events·first seen 47h ago

Aliases: Apparent Psychological Profiles of Large Language Models are Largely a Measurement Artifact

More like this (12)

Automated reproducibility assessments in the social and behavioral sciences using large language models large language models The Shibboleth Effect: Auditing the Cross-Lingual Distributional Skew of Large Language Models large language model agents Decomposing Factual Sycophancy in Language Models: How Size and Instruction Tuning Shape Robustness Large Language Models (frontier)Unintended Effects of Geographic Conditioning in Large Language Models Multimodal Large Language Models Civil Court Simulation with Large Language Models From Texts to Scores: Tracing the Emergence of Essay Quality Representations in Large Language Models Agentic Environment Engineering for Large Language Models: A Survey of Environment Modeling, Synthesis, Evaluation, and Application 1B-scale language models

Recent events (1)

7arXiv · cs.CL·47h ago·source ↗

LLM psychological profiles are largely measurement artifacts, not model properties

A new arXiv preprint administers a battery of personality and risk-preference instruments to 56 instruction-tuned LLMs alongside large human reference samples, finding that 81-90% of between-model variation is explained by directional response bias rather than the traits the instruments target. The authors introduce the concept of 'response orthogonality' to explain why some instruments appear more reliable than others, and show that apparent psychological profiles can be manufactured through item selection. The findings challenge the validity of using human-designed psychometric tools to characterize LLMs, with direct implications for safety assessment and the use of LLMs as proxies for human participants in research.

Evaluation and Benchmarking AI Safety Research Apparent Psychological Profiles of Large Language Models are Largely a Measurement Artifact