lambada-ee2418fd·2 events·first seen Aliases: LAMBADA
HOLA (Hippocampal Linear Attention) augments linear-attention and state-space models with a bounded exact key-value cache inspired by Complementary Learning Systems theory, addressing the lossy compression problem that causes earlier facts to be overwritten in recurrent states. The cache uses a residual-based eviction criterion (large beta * ||e||) without a learned eviction module, and a decoupled RMSNorm-gamma read for sharp retrieval. At 340M parameters trained on 15B SlimPajama tokens, HOLA reduces Wikitext perplexity from 27.32 to 22.92, falling below a full-attention Transformer++ baseline, and shows strong needle-in-a-haystack recall out to 32k tokens despite training only at 2k. The work is directly relevant to the open question of whether linear-attention models can match full-attention on long-context retrieval tasks.
Researchers propose replacing the standard transformer feed-forward sublayer with explicit fuzzy set operations (intersection and set-difference), creating a negation-capable FFN (NC-FFN) whose hidden units carry interpretable logical form. At 125M scale on OpenWebText, NC-FFN matches GELU baseline perplexity while remaining legible by construction. Adding soft sequence quantifiers with learned forgetting rates recovers grammatical licensing deficits and produces units that detectably fire on grammatical licensors (comparatives, passive participles, negative-polarity items) without dictionary learning. The work advances mechanistic interpretability by providing a parameter-neutral architecture whose computations are readable as grammatical mechanisms.