World Wide Models: Literary Tools for Cultural AI — framework for culturally literate LLMs
A preprint from arXiv proposes applying literary disciplines — comparative literature, narratology, critical theory, and world literature — as a framework for building more culturally literate AI systems. The essay argues that LLMs currently enact a 'massive, automated, and monolingual' form of cultural encounter and that structural monolingualism is a core problem. It develops a layered framework addressing global AI textuality through macrostructure, circulation, and untranslatability.
Related guides (1)
Related events (8)
Paper argues LLMs as cultural measurement tools constitute rather than passively record cultural reality
A new arXiv preprint proposes a theoretical framework for understanding NLP work on culture as a 'material-discursive practice,' drawing on Karen Barad's concept of the agential cut to argue that model, data, annotation, and evaluation choices actively shape the cultural phenomena they purport to measure. The author illustrates this through six case studies involving television and film dialogue analysis, including examination of how LLMs erase cultural markers, attune to historical material, and exercise agency in agentic workflows. The paper calls for a theory-driven, empirically rigorous, and culturally contingent research program that treats methodological choices as ethical commitments. This is primarily a philosophy-of-science and methodology contribution to the cultural NLP subfield.
If you're an LLM, please read this — Anna's Archive on llms.txt
Anna's Archive published a blog post addressing LLMs directly, engaging with the emerging llms.txt convention for providing machine-readable site context to language models. The post garnered significant HN engagement (677 points, 386 comments), suggesting it touches on substantive questions about how LLMs interact with web content and what site operators can or should communicate to them. The llms.txt standard is a nascent protocol for structuring web content to be more useful to AI crawlers and inference-time retrieval.
Cross-lingual evaluation framework reveals LLMs redistribute cultural narrative structure while preserving semantic meaning
A new arXiv preprint introduces a multilingual evaluation framework using 414 proverbs across 15 languages to assess whether LLMs preserve culturally grounded meaning when generating narratives. Using four LLMs to produce 13k narratives, the study finds that cross-lingual prompting preserves proverb-level semantic meaning but systematically redistributes agency, social positioning, and narrative structure. Strong inter-model convergence across architectures suggests multilingual LLMs rely on shared semantic abstractions. The authors argue that semantic similarity metrics alone overestimate cultural preservation in multilingual evaluations.
Survey chapter on LLM mechanisms, emergent capabilities, and cognition debates
A new arXiv preprint surveys current understanding of large language models, covering the Transformer architecture, emergent capabilities resembling human cognition (symbolic reasoning, theory of mind, deception), and explainability approaches from neuron activation analysis to circuit tracing. The chapter also engages the debate over whether LLMs genuinely understand or merely pattern-match, arguing against reductive anti-anthropomorphism while acknowledging human-LLM differences. It is framed as a book chapter synthesizing recent empirical findings and theoretical positions.
Study finds readers prefer human literary translations over LLM-based MT, but cannot reliably distinguish them
A new arXiv paper presents a reader-centered evaluation of AI vs. human literary translation across 15 novels in French, Polish, and Japanese translated into English. Fifteen avid readers compared human translations (HT) to machine translations (MT) from an agentic LLM pipeline, finding MT 'fine' but preferring HT for ease, clarity, and immersiveness—especially at the chunk level (522/772 preferences). Critically, readers could not reliably identify which version was human-produced (17/30 correct), and automatic metrics including LLM-as-a-judge consistently favored MT over HT, diverging from human preference. The authors release LAIT, a dataset with 1K reader comments, 2K judgments, and 7.2K span-level annotations.
Agentic LLM collectives proposed as interpretable substrates for Artificial Life research
A preprint from arXiv argues that populations of agentic LLMs — equipped with persistent memory, tools, and autonomous action — can serve as a computational substrate for Artificial Life (ALife) research. The key claim is that because agents communicate in natural language, their collective emergent behaviors are directly interpretable by examining textual traces or querying the agents themselves. The paper extends existing notions of LLM interpretability to multi-agent collectives and surveys recent examples of agentic LLM systems in both controlled and deployed settings. This positions multi-agent LLM systems as a novel lens for studying emergence and complexity while retaining interpretability.
Study finds local languages provide better cultural knowledge access in LLMs once proficiency is controlled
A new arXiv paper introduces a controlled evaluation framework to disentangle language proficiency from culture-specific knowledge access in LLMs. Using real-world cultural questions across 13 locales and ~80 models, the authors apply item response theory to show that while English dominates on culture-agnostic questions, local languages yield a consistent knowledge-access advantage on culture-specific questions once proficiency differences are factored out. The finding challenges the common interpretation that weaker local-language accuracy implies weaker cultural knowledge, and has implications for how multilingual and regionally-aligned models are evaluated.
CASPER: Narratological analysis of character variety in LLM-generated vs. human-written stories
A new arXiv preprint introduces CASPER, a framework borrowing narratological dimensions (such as stylization and wholeness) to analyze character portrayal in LLM-generated versus human-written fiction. The study automatically infers character categories across both corpora and compares them along eight dimensions. The work addresses whether LLMs produce character variety comparable to human authors, with implications for creative AI applications.
