technique
Hierarchical Relative Policy Optimization
techniqueactiveprovisional
hierarchical-relative-policy-optimization-66659289·1 events·first seen 2d agoAliases: Hierarchical Relative Policy Optimization
Co-occurring entities
More like this (12)
Preference Coordinated Multi-agent Policy OptimizationPareto Optimal Policy OptimizationGRPO (Group Relative Policy Optimization)APPO: Agentic Procedural Policy OptimizationDivergence Regularized Policy OptimizationProximal Policy OptimizationTraining LLMs to Enforce Multi-Level Instruction Hierarchies via Gravity-Weighted Direct Preference OptimizationVector Policy OptimizationHierarchical Reinforcement LearningEvolved Policy Gradientshierarchical summarizationGravity-Weighted Direct Preference Optimization
Recent events (1)
AdaSR: Adaptive streaming reasoning framework with Hierarchical Relative Policy Optimization
Researchers introduce AdaSR, a framework enabling large reasoning models to reason incrementally during streaming input (e.g., audio/video) rather than waiting for complete context, then perform final deliberation once the stream ends. The core contribution is Hierarchical Relative Policy Optimization (HRPO), which decomposes policy optimization into streaming and deep reasoning phases with fine-grained per-phase advantage assignment, integrating format, accuracy, and latency-aware rewards. Experiments show AdaSR improves the tradeoff among reasoning accuracy, computational efficiency, and streaming latency over supervised fine-tuning baselines. Code is publicly released.