Almanac
technique

Supervised Memory Training

techniqueactiveprovisionalsupervised-memory-training-0b78c137·1 events·first seen 12d ago

Aliases: Supervised Memory Training

Co-occurring entities

More like this (12)

Recent events (1)

6arXiv · cs.LG·12d ago·source ↗

Supervised Memory Training enables parallel RNN pretraining without backpropagation through time

A new arXiv preprint proposes Supervised Memory Training (SMT), a method that trains recurrent neural networks by reducing the problem to supervised learning on one-step memory transitions, bypassing backpropagation through time entirely. A Transformer-based encoder generates memory labels via a predictive state objective, enabling time-parallel training with O(1) gradient path length between any two tokens. SMT outperforms BPTT on language modeling and pixel sequence modeling tasks across multiple RNN architectures. The approach could enable RNNs to scale more effectively by decoupling memory content from update mechanics.