Almanac
technique

Evolved Policy Gradients

techniqueactiveevolved-policy-gradients-ab4538b4·1 events·first seen 28d ago

Aliases: Evolved Policy Gradients

Co-occurring entities

More like this (12)

Recent events (1)

4Openai Blog·28d ago·source ↗

Evolved Policy Gradients: OpenAI Meta-Learning via Loss Function Evolution

OpenAI released Evolved Policy Gradients (EPG), a meta-learning method that evolves the loss function used to train reinforcement learning agents rather than hand-designing it. The approach enables faster adaptation to novel tasks, with agents demonstrating generalization to test-time scenarios outside their training distribution, such as navigating to objects placed in new locations. EPG represents an experimental direction in automated algorithm discovery for RL.