6arXiv cs.CL (Computation and Language)·3d ago

LMs encode knowledge in task-specific parameter subsets, undermining the knowledge-base analogy

A new arXiv paper investigates whether language models satisfy the consistency property of knowledge bases — that the same fact returns consistent results regardless of query form. Behavioral and mechanistic analyses reveal that LMs encode knowledge in a task-specific manner: facts acquired on one task frequently fail to transfer to others during training, and distinct parameter subsets underlie the same fact across different tasks. The authors also show that chain-of-thought reasoning derives part of its effectiveness by engaging task-specific parameters beyond those tied to the evaluation task, with implications for factual reliability and model controllability.

Evaluation and Benchmarking AI Safety Research LMs as Task-Specific Knowledge Bases: An Interpretability Analysis

Related guides (2)

AI Safety ResearchTopic guide

AI Safety Research: From Lab Policies to Real-World Flashpoints

Read asBeginner In-depth

Evaluation and BenchmarkingTopic guide

Evaluation and Benchmarking: How We Measure AI — and Why It Keeps Getting Harder

Read asBeginner In-depth

Related events (8)

6arXiv · cs.LG·6d ago·source ↗

PAC-Bayes analysis establishes formal expressivity and alignment floors for prompt-conditioned LLMs

A new arXiv preprint models user-LLM interaction as a bilevel cheap-talk game and derives PAC-Bayes bounds showing two irreducible limitations: an 'expressivity floor' where language's finite channel capacity makes distinct tasks indistinguishable, and an 'objective-misalignment floor' where alignment constraints prevent reaching user-ideal outputs. The authors prove that prompt-conditioned LLMs cannot be universal problem solvers, as correct behavior on certain task families is provably unattainable even with infinite data, optimal training, or model scaling. The work suggests multimodal inputs and external memory as potential mitigations by increasing task-relevant information bandwidth.

Evaluation and Benchmarking Alignment and RLHF PAC-Bayes On the Limits of Prompt-Conditioned Language Models as General-Purpose Learners

5arXiv · cs.CL·5d ago·source ↗

Cross-lingual prompting strategies unlock hidden parametric knowledge in LLMs

A new arXiv preprint investigates how cross-lingual prompting can surface factual knowledge that standard inference techniques fail to retrieve in multilingual LLMs. The authors identify four dimensions of cross-lingual exploration governing parametric knowledge retrieval and evaluate them on multilingual factual benchmarks across 17 typologically diverse languages. Results show cross-lingual exploration improves both factual recall and cross-lingual consistency, and is claimed to be a more compute-efficient approach than scaling native-language inference.

Evaluation and Benchmarking Cross-Lingual Exploration for Parametric Knowledge

5arXiv · cs.CL·17h ago·source ↗

Paper argues LLMs are a degenerate special case of world models, maps continuous spectrum from NTP to JEPA

A new arXiv preprint reframes the LLM-vs-world-model debate by arguing that LLMs are a degenerate special case of world models rather than a fundamentally different paradigm, with the state space being token sequences and the only action being token appending. The paper maps a continuous spectrum from next-token prediction through multi-token prediction, future-summary prediction, and next-latent prediction up to JEPA-style architectures. It identifies two open research challenges in moving along this spectrum: the data cliff from self-supervised text to action-labeled environments, and whether transformers generalize to continuous-state prediction or require a new architectural primitive. The work directly engages with Yann LeCun's 2022 argument that general intelligence requires abandoning autoregressive prediction.

From Tokens to States: LLMs as a Special Case of World Models and the Continuous Path Beyond Yann LeCun JEPA

5arXiv · cs.AI·1mo ago·source ↗

Adversarial Subspace Alignment for Robust Multimodal Knowledge Editing in MLLMs

This paper addresses the generalization gap in multimodal large language model (MLLM) knowledge editing, where edits fail to propagate across semantically equivalent visual and linguistic variations. The authors introduce Latent Adversarial Robustification (LAR), which generates adversarial but semantically coherent variants in joint latent space, and Rank-Constrained Subspace Learning (RCSL), which enforces low-rank alignment of adversarial representations at the edit layer. Together these form the ASAM framework, which formalizes robustness via knowledge units grouping semantically equivalent multimodal inputs. Empirical analysis demonstrates improved generality without sacrificing reliability or locality.

Alignment and RLHF Multimodal Progress Multimodal Large Language Models Latent Adversarial Robustification (LAR)knowledge editing +2 more

5arXiv · cs.CL·20d ago·source ↗

Causal evaluation framework for learnability of formal language tasks in LMs

A new arXiv preprint proposes a causal framework for evaluating how much task-specific data language models need to learn a given task. The authors use formal languages generated by probabilistic finite automata as a controlled testbed, introducing the 'binning semiring' algebraic object to control property frequency in training corpora. Experiments show that standard correlational evaluation practices produce incorrect learnability conclusions due to confounders, with implications for how natural-language task learning is studied.

Evaluation and Benchmarking Kullback-Leibler divergence Causally Evaluating the Learnability of Formal Language Tasks binning semiring

3arXiv · cs.CL·13d ago·source ↗

Revisiting LLM systematicity in negation understanding via in-context learning

A new arXiv preprint analyzes how well large language models handle negation from two angles: behavioral systematicity (whether models correctly recognize negation expressions and scope) and representational systematicity (whether function vectors can be reliably constructed from in-context examples). Results show LLMs partially succeed at negation cue recognition via in-context learning but struggle with scope recognition, with performance varying by output format. Function vectors can be composed for cue extraction but are harder to extract for scope recognition tasks.

Evaluation and Benchmarking Revisiting the Systematicity in Negation in the Era of In-Context Learning

6arXiv · cs.AI·17d ago·source ↗

Study finds shared pattern-matching mechanisms underlie both human and LLM everyday reasoning errors

A new arXiv paper evaluates human participants and 25 LLMs on commonsense causal reasoning tasks, finding similar error patterns in both groups. The authors identify specific attention heads driving LLM responses that implement pattern-matching, and show these heads can predict human reasoning errors caused by superficially irrelevant prompt details. The findings challenge the common assumption that human reasoning relies on principled abstract world models while LLMs merely pattern-match, suggesting both may share a more unified cognitive mechanism.

Evaluation and Benchmarking AI Safety Research Reasoning as Pattern Matching: Shared Mechanisms in Human and LLM Everyday Reasoning

4arXiv · cs.CL·28d ago·source ↗

Language Models Learn Constructional Semantics, Not To Mention Syntax: Investigating LM Understanding of Paired-Focus Constructions

This paper investigates whether language models can learn the semantics of rare English constructions (e.g., 'let alone', 'much less'), constructing a novel dataset to test form-meaning pairing understanding. Testing models across parameter counts, architectures, and pretraining dataset sizes, the authors find that modestly sized open-source models can grasp Paired-Focus construction semantics, while models trained on human-scale data fail. Training dynamics analysis reveals that semantic understanding of these constructions emerges later than syntactic knowledge and correlates with gains in world knowledge more broadly.

Evaluation and Benchmarking Open Weights Progress Paired-Focus Constructions constructional semantics scalar adjectival semantics +1 more