Entity · benchmark

LAMBADA

benchmarkactiveprovisionallambada-ee2418fd·2 events·first seen 2d ago

Aliases: LAMBADA

Co-occurring entities

WikiText-2 SlimPajama Complementary Learning Systems HOLA (Hippocampal Linear Attention)RULER OpenWebText Explicit Fuzzy Logic in the Feed-Forward Layer: Self-Forgetting Quantifiers Discover Legible Grammatical-Licensing Detectors NC-FFN

More like this (12)

LAMBDA LACUNA LACUNA LIMA LLaDA LAMDA-CL LamPO LUCAS Bamba Llama DanceOPD DreamLM

Recent events (2)

6arXiv · cs.AI·19h ago·source ↗

HOLA adds hippocampal exact KV cache to linear attention, closing gap with full-attention Transformers

HOLA (Hippocampal Linear Attention) augments linear-attention and state-space models with a bounded exact key-value cache inspired by Complementary Learning Systems theory, addressing the lossy compression problem that causes earlier facts to be overwritten in recurrent states. The cache uses a residual-based eviction criterion (large beta * ||e||) without a learned eviction module, and a decoupled RMSNorm-gamma read for sharp retrieval. At 340M parameters trained on 15B SlimPajama tokens, HOLA reduces Wikitext perplexity from 27.32 to 22.92, falling below a full-attention Transformer++ baseline, and shows strong needle-in-a-haystack recall out to 32k tokens despite training only at 2k. The work is directly relevant to the open question of whether linear-attention models can match full-attention on long-context retrieval tasks.

Long Context Evolution WikiText-2 LAMBADA SlimPajama +3 more

6arXiv · cs.CL·2d ago·source ↗

Negation-capable fuzzy logic FFN replacement yields interpretable grammatical licensing detectors in transformers

Researchers propose replacing the standard transformer feed-forward sublayer with explicit fuzzy set operations (intersection and set-difference), creating a negation-capable FFN (NC-FFN) whose hidden units carry interpretable logical form. At 125M scale on OpenWebText, NC-FFN matches GELU baseline perplexity while remaining legible by construction. Adding soft sequence quantifiers with learned forgetting rates recovers grammatical licensing deficits and produces units that detectably fire on grammatical licensors (comparatives, passive participles, negative-polarity items) without dictionary learning. The work advances mechanistic interpretability by providing a parameter-neutral architecture whose computations are readable as grammatical mechanisms.

Evaluation and Benchmarking LAMBADA OpenWebText Explicit Fuzzy Logic in the Feed-Forward Layer: Self-Forgetting Quantifiers Discover Legible Grammatical-Licensing Detectors +1 more