Entity · technique

HOLA (Hippocampal Linear Attention)

techniqueactiveprovisionalhola-hippocampal-linear-attention--75ee63f3·1 events·first seen 20h ago

Aliases: HOLA (Hippocampal Linear Attention)

Co-occurring entities

WikiText-2 LAMBADA SlimPajama Complementary Learning Systems RULER

More like this (12)

positional attention heads bidirectional attention cross-attention Locality-Sensitive Hashing Attention attention head circuit symbolic attention heads DOA (Decoder-Only Attention)Hope-attention Lie-Algebra Attention Multi-head Latent Attention (MLA)reference attention Differential Attention

Recent events (1)

6arXiv · cs.AI·20h ago·source ↗

HOLA adds hippocampal exact KV cache to linear attention, closing gap with full-attention Transformers

HOLA (Hippocampal Linear Attention) augments linear-attention and state-space models with a bounded exact key-value cache inspired by Complementary Learning Systems theory, addressing the lossy compression problem that causes earlier facts to be overwritten in recurrent states. The cache uses a residual-based eviction criterion (large beta * ||e||) without a learned eviction module, and a decoupled RMSNorm-gamma read for sharp retrieval. At 340M parameters trained on 15B SlimPajama tokens, HOLA reduces Wikitext perplexity from 27.32 to 22.92, falling below a full-attention Transformer++ baseline, and shows strong needle-in-a-haystack recall out to 32k tokens despite training only at 2k. The work is directly relevant to the open question of whether linear-attention models can match full-attention on long-context retrieval tasks.

Long Context Evolution WikiText-2 LAMBADA SlimPajama +3 more