Almanac
person

Leslie Pack Kaelbling

personactiveleslie-pack-kaelbling-9d7e168f·1 events·first seen 1mo ago

Aliases: Leslie Pack Kaelbling

Co-occurring entities

More like this (12)

Recent events (1)

6Berkeley Ai Research (Bair) Blog·1mo ago·source ↗

RL without TD Learning: Divide-and-Conquer Value Learning for Long-Horizon Off-Policy RL

A BAIR blog post introduces a divide-and-conquer paradigm for off-policy reinforcement learning that avoids temporal difference (TD) learning's error accumulation problem by reducing Bellman recursions logarithmically rather than linearly. The approach leverages the triangle inequality structure of goal-conditioned RL to define a transitive Bellman update rule, enabling value learning that scales to long-horizon tasks. The authors claim this is the first practical realization of divide-and-conquer value learning at scale in goal-conditioned RL settings, building on an idea traceable to Kaelbling (1993). The post frames this as a third paradigm alongside TD and Monte Carlo methods, addressing a key gap in scalable off-policy RL.