Entity · person

Aditya (co-lead author)

personactiveaditya-co-lead-author--df6afb8b·1 events·first seen May 18, 2026

Aliases: Aditya (co-lead author)

Co-occurring entities

Leslie Pack Kaelbling Divide-and-Conquer Value Learning Berkeley AI Research (BAIR)GRPO PPO Temporal Difference Learning Floyd-Warshall Algorithm Goal-Conditioned Reinforcement Learning Q-learning

More like this (12)

Adithya S K Aditya G. Parameswaran Aditi Krishnapriyan Ada Co-Scientist Andrew Ng Daron Acemoglu Vas Narasimhan Adam Dravid et al., 2023 Sam Altman David Chen

Recent events (1)

6Berkeley Ai Research (Bair) Blog·May 18, 2026·source ↗

RL without TD Learning: Divide-and-Conquer Value Learning for Long-Horizon Off-Policy RL

A BAIR blog post introduces a divide-and-conquer paradigm for off-policy reinforcement learning that avoids temporal difference (TD) learning's error accumulation problem by reducing Bellman recursions logarithmically rather than linearly. The approach leverages the triangle inequality structure of goal-conditioned RL to define a transitive Bellman update rule, enabling value learning that scales to long-horizon tasks. The authors claim this is the first practical realization of divide-and-conquer value learning at scale in goal-conditioned RL settings, building on an idea traceable to Kaelbling (1993). The post frames this as a third paradigm alongside TD and Monte Carlo methods, addressing a key gap in scalable off-policy RL.

Evaluation and Benchmarking Agent and Tool Ecosystem Leslie Pack Kaelbling Divide-and-Conquer Value Learning Berkeley AI Research (BAIR)+8 more