Entity · paper

Pretraining Recurrent Networks without Recurrence

paperactivepretraining-recurrent-networks-without-recurrence-c56b666f·1 events·first seen Jun 5, 2026

Aliases: Pretraining Recurrent Networks without Recurrence

Co-occurring entities

backpropagation through time Supervised Memory Training

More like this (12)

Self-Supervised Pretraining temporally ordered pre-training Recurrent Neural Network Unsupervised Pre-training Prior-Data Fitted Networks T^2MLR: Transformer with Temporal Middle-Layer Recurrence q0: Primitives for Hyper-Epoch Pretraining Do You Really Need to Pretrain Q-Functions for Online RL Fine-Tuning?Associative Recurrent Memory Transformer Understanding Reasoning from Pretraining to Post-Training multimodal pretraining Contrastive Pre-training

Recent events (1)

6arXiv · cs.LG·Jun 5, 2026·source ↗

Supervised Memory Training enables parallel RNN pretraining without backpropagation through time

A new arXiv preprint proposes Supervised Memory Training (SMT), a method that trains recurrent neural networks by reducing the problem to supervised learning on one-step memory transitions, bypassing backpropagation through time entirely. A Transformer-based encoder generates memory labels via a predictive state objective, enabling time-parallel training with O(1) gradient path length between any two tokens. SMT outperforms BPTT on language modeling and pixel sequence modeling tasks across multiple RNN architectures. The approach could enable RNNs to scale more effectively by decoupling memory content from update mechanics.

Training Infrastructure backpropagation through time Supervised Memory Training Pretraining Recurrent Networks without Recurrence