paper

REAR: Test-time Preference Realignment through Reward Decomposition

paperactiveprovisionalrear-test-time-preference-realignment-through-reward-decomposition-411a08aa·1 events·first seen 15h ago

Aliases: REAR: Test-time Preference Realignment through Reward Decomposition

Co-occurring entities

REAlignment Reward

More like this (12)

Hybrid Reward Advantage Splitting RREDCoT: Segment-Level Reward Redistribution for Reasoning Models REAlignment Reward In-Context Reward Adaptation Gradient-Guided Reward Optimization Drifting Preference Optimization reward model ExpRL: Exploratory RL for LLM Mid-Training rule-based reinforcement learning rewards Using Reward Uncertainty to Induce Diverse Behaviour in Reinforcement Learning Rule-Based Rewards reward-induced maximum likelihood

Recent events (1)

6arXiv · cs.CL·15h ago·source ↗

REAR: Test-time reward decomposition for preference realignment in LLMs

Researchers introduce REAR (REAlignment Reward), a training-free framework for aligning LLMs with diverse user preferences at test time. The method decomposes the reward function into question-related and preference-related components, then derives a realignment reward expressible as a linear combination of token-level log-probabilities. This formulation integrates cleanly with existing test-time scaling algorithms like best-of-N sampling and tree search, and experiments show it generalizes across preference alignment, math, and visual tasks.

Evaluation and Benchmarking Inference Economics REAlignment Reward REAR: Test-time Preference Realignment through Reward Decomposition +1 more