technique
Double Q-learning
techniqueactiveprovisional
double-q-learning-d13a4f19·1 events·first seen 5d agoAliases: Double Q-learning
Co-occurring entities
More like this (12)
Q-learningSoft Q-LearningDNQ: Deep Nash Q-Network for Partially Observable n-Player GamesQuantisation-Aware Training (QAT)shielded reinforcement learningsim-to-real reinforcement learningdecoupled reinforcement learningExpRL: Exploratory RL for LLM Mid-TrainingEntropy-Regularized Reinforcement LearningKL-regularized RLAlternating Token-Weighted Unlearningreinforcement learning from verifier feedback
Recent events (1)
DoorDash deploys multi-agent RL system for adaptive dispatch objective weights in food-delivery marketplace
Researchers at DoorDash present a deployed reinforcement learning system that adapts dispatch objective weights in a three-sided food-delivery marketplace using delayed operational feedback signals. Rather than replacing the combinatorial optimizer, a store-level policy selects discrete multipliers that shift the optimizer's tradeoff between delivery quality and batching efficiency. The system uses centralized offline training with Double Q-learning and a conservative regularizer to handle out-of-distribution overestimation, then executes decentrally per store. A production switchback experiment shows increased batching and reduced courier time costs without degrading customer delivery quality.