technique

Language Identity Head Ablation

techniqueactiveprovisionallanguage-identity-head-ablation-03c97e02·1 events·first seen 4h ago

Aliases: Language Identity Head Ablation

Co-occurring entities

First-Token Broadcasters: Mechanistic Origins of Language Identity and Distributed Robustness in Transformers Qwen2.5-7B-Instruct-1M Qwen2.5-1.5B-Base GPT-2

More like this (12)

language-aware adapter heads Local Modality Substitution language-adaptive switch vision-language grounding The Neutral Mask: How RLHF Provides Shallow Alignment while Leaving Partisan Structure Intact in a Large Language Model symbolic attention heads Abstraction Gap The Masked Advantage: Uncovering Local-Language Access to Cultural Knowledge in LLMs IWSLT 2026 Cross-Lingual Voice Cloning Language Generation in the Limit From Self to Other: Evaluating Demographic Perspective-Taking in LLM Hate Speech Annotation Actionable Activation Directions for Detecting and Mitigating Emergent Misalignment Across Language Model Families

Recent events (1)

5arXiv · cs.CL·4h ago·source ↗

LIHA reveals first-token broadcaster heads as mechanistic source of language identity in transformers

Researchers introduce Language Identity Head Ablation (LIHA), a causal intervention that zeros individual attention heads to measure language-switching behavior across 2,700 prompt-language pairs in seven languages. Applied to GPT-2, LIHA identifies a small set of 'first-token broadcaster' heads that propagate language identity signals throughout generation, with compensatory redistribution following a hierarchical, feedforward pattern. A controlled comparison between Qwen2.5-1.5B-Base and Qwen2.5-1.5B-Instruct provides direct causal evidence that instruction tuning reorganizes language identity circuits toward early-layer localization. The findings offer mechanistic grounding for why multilingual models generate in the wrong language and why this is difficult to correct.

Evaluation and Benchmarking Alignment and RLHF First-Token Broadcasters: Mechanistic Origins of Language Identity and Distributed Robustness in Transformers Language Identity Head Ablation Qwen2.5-7B-Instruct-1M +2 more