Entity · technique

backpropagation through time

techniqueactivebackpropagation-through-time-fcd475f2·2 events·first seen May 18, 2026

Aliases: backpropagation through time

Co-occurring entities

Supervised Memory Training Pretraining Recurrent Networks without Recurrence Mike Rabbat Meta AI world model UC Berkeley Aditi Krishnapriyan Yann LeCun Amir Bar GRASP

More like this (12)

Temporal Difference Learning Probabilistic Time Series Forecasting Dynamic Time Warping inference-time compute Feedforward Neural Networks autoregressive transformer Hindsight Experience Replay Iterated Amplification Q-learning gradient accumulation Imitation Learning inference-time compute scaling

Recent events (2)

6arXiv · cs.LG·Jun 5, 2026·source ↗

Supervised Memory Training enables parallel RNN pretraining without backpropagation through time

A new arXiv preprint proposes Supervised Memory Training (SMT), a method that trains recurrent neural networks by reducing the problem to supervised learning on one-step memory transitions, bypassing backpropagation through time entirely. A Transformer-based encoder generates memory labels via a predictive state objective, enabling time-parallel training with O(1) gradient path length between any two tokens. SMT outperforms BPTT on language modeling and pixel sequence modeling tasks across multiple RNN architectures. The approach could enable RNNs to scale more effectively by decoupling memory content from update mechanics.

Training Infrastructure backpropagation through time Supervised Memory Training Pretraining Recurrent Networks without Recurrence

6Berkeley Ai Research (Bair) Blog·May 18, 2026·source ↗

GRASP: Gradient-based Planning for World Models at Longer Horizons

Researchers from Berkeley, Meta, and collaborators introduce GRASP, a gradient-based planner designed to make long-horizon planning with learned world models more robust. The method addresses three core failure modes: ill-conditioned computation graphs from backpropagation through time, non-greedy loss landscapes with many local minima, and brittle gradients through high-dimensional vision models. GRASP lifts trajectory optimization into virtual states for parallel optimization across time, injects stochasticity into state iterates for exploration, and reshapes gradients to avoid problematic state-input gradient paths. The work is positioned in the context of scaling world models toward general-purpose simulators usable for control and planning.

Long Context Evolution Frontier Model Releases Mike Rabbat backpropagation through time Meta AI +7 more