technique

IQL

techniqueactiveprovisionaliql-491af432·1 events·first seen 13h ago

Aliases: IQL

Co-occurring entities

Generalization in offline RL: The structure is more important than the amount of pessimism CQL

More like this (12)

CQL IndQA QVal QVal TripletQL MedQADE Web IQ QAGS StrategyQA QUBRIC FreshQA GQA

Recent events (1)

4arXiv · cs.AI·13h ago·source ↗

Offline RL generalization depends on symmetry structure of pessimism, not its magnitude

A new arXiv preprint argues that successful generalization in offline reinforcement learning depends on whether the pessimistic value function respects the symmetries of the optimal solution, not on the degree of pessimism applied. The authors prove that a mildly pessimistic but non-symmetric value function can generalize worse than an overly pessimistic symmetric one, with implications for how data augmentation should be applied. They validate the theory empirically using IQL and CQL on a rotationally symmetric reacher environment, recommending a consistency loss during policy extraction over augmented dataset training.

Evaluation and Benchmarking Generalization in offline RL: The structure is more important than the amount of pessimism CQL IQL