Entity · technique

rubric-based reward shaping

techniqueactiverubric-based-reward-shaping-ea61bec7·1 events·first seen May 19, 2026

Aliases: rubric-based reward shaping

Co-occurring entities

semantic retrieval Reinforcement Learning from Human Feedback AMARIS

More like this (12)

rubric-based rewards Rubric Reward Rule-Based Rewards Rubric-based Feedback Evaluation Preference-Aware Rubric Learning Rubric-Conditioned Self-Distillation rule-based reinforcement learning rewards Process Reward Model Rubrics on Trial Reward Learning from Comparisons reinforcement fine-tuning reward model

Recent events (1)

6arXiv · cs.CL·May 19, 2026·source ↗

AMARIS: Memory-Augmented Rubric Improvement System for Rubric-Based Reinforcement Learning

AMARIS introduces a persistent evaluation memory system to improve rubric-based reward shaping in LLM fine-tuning via reinforcement learning. Unlike prior adaptive rubric methods that discard evaluation diagnostics after each step, AMARIS accumulates step-level summaries and retrieves relevant historical context via both static (recent steps) and dynamic (semantic similarity) retrieval to inform rubric updates. The system runs asynchronously alongside the RL training loop with approximately 5% time overhead. Experiments across closed and open-ended domains show consistent improvements over baselines, with ablations confirming that combining both retrieval modes yields the strongest results.

Evaluation and Benchmarking Agent and Tool Ecosystem semantic retrieval Reinforcement Learning from Human Feedback AMARIS +2 more