paper

Generalization in offline RL: The structure is more important than the amount of pessimism

paperactiveprovisionalgeneralization-in-offline-rl-the-structure-is-more-important-than-the-amount-of-pessimism-e0cb1ac9·1 events·first seen 14h ago

Aliases: Generalization in offline RL: The structure is more important than the amount of pessimism

Co-occurring entities

CQL IQL

More like this (12)

Pessimism's Paradox: Conservative Offline Training Amplifies Reward Hacking During Online Adaptation in Reasoning Models ExpRL: Exploratory RL for LLM Mid-Training KL-regularized RL Hierarchical Advantage Weighting for Online RL Fine-Tuning of VLAs from Sparse Episode Outcomes Radial Suppression Accelerates Algorithmic Generalization: A Geometric Analysis of Delayed Generalization Generalized LR Parsing General Preference Reinforcement Learning The Neutral Mask: How RLHF Provides Shallow Alignment while Leaving Partisan Structure Intact in a Large Language Model Is One Layer Enough? Training A Single Transformer Layer Can Match Full-Parameter RL Training Turing-RL Reinforcement Learning with Metacognitive Feedback Elicits Faithful Uncertainty Expression in LLMs Using Reward Uncertainty to Induce Diverse Behaviour in Reinforcement Learning

Recent events (1)

4arXiv · cs.AI·14h ago·source ↗

Offline RL generalization depends on symmetry structure of pessimism, not its magnitude

A new arXiv preprint argues that successful generalization in offline reinforcement learning depends on whether the pessimistic value function respects the symmetries of the optimal solution, not on the degree of pessimism applied. The authors prove that a mildly pessimistic but non-symmetric value function can generalize worse than an overly pessimistic symmetric one, with implications for how data augmentation should be applied. They validate the theory empirically using IQL and CQL on a rotationally symmetric reacher environment, recommending a consistency loss during policy extraction over augmented dataset training.

Evaluation and Benchmarking Generalization in offline RL: The structure is more important than the amount of pessimism CQL IQL