paper
REAR: Test-time Preference Realignment through Reward Decomposition
paperactiveprovisional
rear-test-time-preference-realignment-through-reward-decomposition-411a08aa·1 events·first seen 15h agoAliases: REAR: Test-time Preference Realignment through Reward Decomposition
Co-occurring entities
More like this (12)
Hybrid Reward Advantage SplittingRREDCoT: Segment-Level Reward Redistribution for Reasoning ModelsREAlignment RewardIn-Context Reward AdaptationGradient-Guided Reward OptimizationDrifting Preference Optimizationreward modelExpRL: Exploratory RL for LLM Mid-Trainingrule-based reinforcement learning rewardsUsing Reward Uncertainty to Induce Diverse Behaviour in Reinforcement LearningRule-Based Rewardsreward-induced maximum likelihood
Recent events (1)
REAR: Test-time reward decomposition for preference realignment in LLMs
Researchers introduce REAR (REAlignment Reward), a training-free framework for aligning LLMs with diverse user preferences at test time. The method decomposes the reward function into question-related and preference-related components, then derives a realignment reward expressible as a linear combination of token-level log-probabilities. This formulation integrates cleanly with existing test-time scaling algorithms like best-of-N sampling and tree search, and experiments show it generalizes across preference alignment, math, and visual tasks.