Multi-Agent Reinforcement Learning from Delayed Marketplace Feedback for Objective-Weight Adaptation in Three-Sided Dispatch
multi-agent-reinforcement-learning-from-delayed-marketplace-feedback-for-objective-weight-adaptation-in-three-sided-dispatch-32986327·1 events·first seen 5d agoAliases: Multi-Agent Reinforcement Learning from Delayed Marketplace Feedback for Objective-Weight Adaptation in Three-Sided Dispatch
Co-occurring entities
More like this (12)
Recent events (1)
DoorDash deploys multi-agent RL system for adaptive dispatch objective weights in food-delivery marketplace
Researchers at DoorDash present a deployed reinforcement learning system that adapts dispatch objective weights in a three-sided food-delivery marketplace using delayed operational feedback signals. Rather than replacing the combinatorial optimizer, a store-level policy selects discrete multipliers that shift the optimizer's tradeoff between delivery quality and batching efficiency. The system uses centralized offline training with Double Q-learning and a conservative regularizer to handle out-of-distribution overestimation, then executes decentrally per store. A production switchback experiment shows increased batching and reduced courier time costs without degrading customer delivery quality.