paper
Pretraining Recurrent Networks without Recurrence
paperactiveprovisional
pretraining-recurrent-networks-without-recurrence-c56b666f·1 events·first seen 12d agoAliases: Pretraining Recurrent Networks without Recurrence
Co-occurring entities
More like this (12)
Self-Supervised Pretrainingtemporally ordered pre-trainingRecurrent Neural NetworkUnsupervised Pre-trainingq0: Primitives for Hyper-Epoch Pretrainingmultimodal pretrainingContrastive Pre-trainingDisentangled RNNsPC Layer: Polynomial Weight Preconditioning for Improving LLM Pre-Traininginstruction-based multitask pretraininglarge neural network trainingFlashbackCL: Mitigating Temporal Forgetting in Federated Learning
Recent events (1)
Supervised Memory Training enables parallel RNN pretraining without backpropagation through time
A new arXiv preprint proposes Supervised Memory Training (SMT), a method that trains recurrent neural networks by reducing the problem to supervised learning on one-step memory transitions, bypassing backpropagation through time entirely. A Transformer-based encoder generates memory labels via a predictive state objective, enabling time-parallel training with O(1) gradient path length between any two tokens. SMT outperforms BPTT on language modeling and pixel sequence modeling tasks across multiple RNN architectures. The approach could enable RNNs to scale more effectively by decoupling memory content from update mechanics.