technique
WY-form triangular chunk solver
techniqueactiveprovisional
wy-form-triangular-chunk-solver-8a969def·1 events·first seen 3d agoAliases: WY-form triangular chunk solver
Co-occurring entities
More like this (12)
Recent events (1)
CARVE: Content-aware gating for linear attention recurrent models improves efficiency and quality over GDN-2
CARVE (Content-Aware Recurrent with Value Efficiency) is a new linear attention architecture that addresses three coupled defects in the GDN-2 delta-rule architecture by restricting erasure to the key axis rather than the value axis. This design choice is proven necessary and sufficient to enable the WY-form triangular chunk solver, enabling competitive training throughput with Transformers. At 1.3B parameters trained on 100B tokens, CARVE achieves lower perplexity than GDN-2, leads recurrent baselines on nine commonsense reasoning benchmarks, and sets state-of-the-art on RULER retrieval probes, while using 13% less peak memory and 19% fewer parameters at 0.4% throughput overhead.