technique
rubric-based reward shaping
techniqueactive
rubric-based-reward-shaping-ea61bec7·1 events·first seen 29d agoAliases: rubric-based reward shaping
Co-occurring entities
More like this (12)
Recent events (1)
AMARIS: Memory-Augmented Rubric Improvement System for Rubric-Based Reinforcement Learning
AMARIS introduces a persistent evaluation memory system to improve rubric-based reward shaping in LLM fine-tuning via reinforcement learning. Unlike prior adaptive rubric methods that discard evaluation diagnostics after each step, AMARIS accumulates step-level summaries and retrieves relevant historical context via both static (recent steps) and dynamic (semantic similarity) retrieval to inform rubric updates. The system runs asynchronously alongside the RL training loop with approximately 5% time overhead. Experiments across closed and open-ended domains show consistent improvements over baselines, with ablations confirming that combining both retrieval modes yields the strongest results.