Temporal Difference Learning
temporal-difference-learning-8bf2aee7·1 events·first seen 1mo agoAliases: Temporal Difference Learning
Co-occurring entities
More like this (12)
Recent events (1)
RL without TD Learning: Divide-and-Conquer Value Learning for Long-Horizon Off-Policy RL
A BAIR blog post introduces a divide-and-conquer paradigm for off-policy reinforcement learning that avoids temporal difference (TD) learning's error accumulation problem by reducing Bellman recursions logarithmically rather than linearly. The approach leverages the triangle inequality structure of goal-conditioned RL to define a transitive Bellman update rule, enabling value learning that scales to long-horizon tasks. The authors claim this is the first practical realization of divide-and-conquer value learning at scale in goal-conditioned RL settings, building on an idea traceable to Kaelbling (1993). The post frames this as a third paradigm alongside TD and Monte Carlo methods, addressing a key gap in scalable off-policy RL.