Entity · technique

Double Q-learning

techniqueactivedouble-q-learning-d13a4f19·1 events·first seen Jun 12, 2026

Aliases: Double Q-learning

Co-occurring entities

DoorDash Multi-Agent Reinforcement Learning from Delayed Marketplace Feedback for Objective-Weight Adaptation in Three-Sided Dispatch

More like this (12)

Q-learning Soft Q-Learning Semantic Pareto-DQN DNQ: Deep Nash Q-Network for Partially Observable n-Player Games Quantisation-Aware Training (QAT)Do You Really Need to Pretrain Q-Functions for Online RL Fine-Tuning?shielded reinforcement learning sim-to-real reinforcement learning Physics-EnhAnced Reinforcement Learning decoupled reinforcement learning ExpRL: Exploratory RL for LLM Mid-Training Entropy-Regularized Reinforcement Learning

Recent events (1)

6arXiv · cs.AI·Jun 12, 2026·source ↗

DoorDash deploys multi-agent RL system for adaptive dispatch objective weights in food-delivery marketplace

Researchers at DoorDash present a deployed reinforcement learning system that adapts dispatch objective weights in a three-sided food-delivery marketplace using delayed operational feedback signals. Rather than replacing the combinatorial optimizer, a store-level policy selects discrete multipliers that shift the optimizer's tradeoff between delivery quality and batching efficiency. The system uses centralized offline training with Double Q-learning and a conservative regularizer to handle out-of-distribution overestimation, then executes decentrally per store. A production switchback experiment shows increased batching and reduced courier time costs without degrading customer delivery quality.

Enterprise Deployment Patterns Agent and Tool Ecosystem Double Q-learning DoorDash Multi-Agent Reinforcement Learning from Delayed Marketplace Feedback for Objective-Weight Adaptation in Three-Sided Dispatch