Entity · person

Leslie Pack Kaelbling

personactiveleslie-pack-kaelbling-9d7e168f·1 events·first seen May 18, 2026

Aliases: Leslie Pack Kaelbling

Co-occurring entities

Divide-and-Conquer Value Learning Berkeley AI Research (BAIR)GRPO PPO Aditya (co-lead author)Temporal Difference Learning Floyd-Warshall Algorithm Goal-Conditioned Reinforcement Learning Q-learning

More like this (12)

Spelke Core Knowledge Systems Andrew Ng StackLLaMA Christopher J. Kelly Daron Acemoglu Jay Kreps Jared Kaplan Yann LeCun Daniel Miessler Lisa E. Gordon-Hagerty Katelyn Lesse K-Dense-AI/scientific-agent-skills

Recent events (1)

6Berkeley Ai Research (Bair) Blog·May 18, 2026·source ↗

RL without TD Learning: Divide-and-Conquer Value Learning for Long-Horizon Off-Policy RL

A BAIR blog post introduces a divide-and-conquer paradigm for off-policy reinforcement learning that avoids temporal difference (TD) learning's error accumulation problem by reducing Bellman recursions logarithmically rather than linearly. The approach leverages the triangle inequality structure of goal-conditioned RL to define a transitive Bellman update rule, enabling value learning that scales to long-horizon tasks. The authors claim this is the first practical realization of divide-and-conquer value learning at scale in goal-conditioned RL settings, building on an idea traceable to Kaelbling (1993). The post frames this as a third paradigm alongside TD and Monte Carlo methods, addressing a key gap in scalable off-policy RL.

Evaluation and Benchmarking Agent and Tool Ecosystem Leslie Pack Kaelbling Divide-and-Conquer Value Learning Berkeley AI Research (BAIR)+8 more