Almanac
technique

Double Q-learning

techniqueactiveprovisionaldouble-q-learning-d13a4f19·1 events·first seen 5d ago

Aliases: Double Q-learning

Co-occurring entities

More like this (12)

Recent events (1)

6arXiv · cs.AI·5d ago·source ↗

DoorDash deploys multi-agent RL system for adaptive dispatch objective weights in food-delivery marketplace

Researchers at DoorDash present a deployed reinforcement learning system that adapts dispatch objective weights in a three-sided food-delivery marketplace using delayed operational feedback signals. Rather than replacing the combinatorial optimizer, a store-level policy selects discrete multipliers that shift the optimizer's tradeoff between delivery quality and batching efficiency. The system uses centralized offline training with Double Q-learning and a conservative regularizer to handle out-of-distribution overestimation, then executes decentrally per store. A production switchback experiment shows increased batching and reduced courier time costs without degrading customer delivery quality.