Entity · technique

Temporal Difference Learning

techniqueactivetemporal-difference-learning-8bf2aee7·1 events·first seen May 18, 2026

Aliases: Temporal Difference Learning

Co-occurring entities

Leslie Pack Kaelbling Divide-and-Conquer Value Learning Berkeley AI Research (BAIR)GRPO PPO Aditya (co-lead author)Floyd-Warshall Algorithm Goal-Conditioned Reinforcement Learning Q-learning

More like this (12)

A Diffusion Approximation for Temporal-Difference Learning with Linear Features under Markovian Noise backpropagation through time contrastive learning Imitation Learning Q-learning temporal glitch detection Dynamic Time Warping T^2MLR: Transformer with Temporal Middle-Layer Recurrence Connectionist Temporal Classification Differential Attention Contrastive Pre-training Probabilistic Time Series Forecasting

Recent events (1)

6Berkeley Ai Research (Bair) Blog·May 18, 2026·source ↗

RL without TD Learning: Divide-and-Conquer Value Learning for Long-Horizon Off-Policy RL

A BAIR blog post introduces a divide-and-conquer paradigm for off-policy reinforcement learning that avoids temporal difference (TD) learning's error accumulation problem by reducing Bellman recursions logarithmically rather than linearly. The approach leverages the triangle inequality structure of goal-conditioned RL to define a transitive Bellman update rule, enabling value learning that scales to long-horizon tasks. The authors claim this is the first practical realization of divide-and-conquer value learning at scale in goal-conditioned RL settings, building on an idea traceable to Kaelbling (1993). The post frames this as a third paradigm alongside TD and Monte Carlo methods, addressing a key gap in scalable off-policy RL.

Evaluation and Benchmarking Agent and Tool Ecosystem Leslie Pack Kaelbling Divide-and-Conquer Value Learning Berkeley AI Research (BAIR)+8 more