Almanac
paper

REAR: Test-time Preference Realignment through Reward Decomposition

paperactiveprovisionalrear-test-time-preference-realignment-through-reward-decomposition-411a08aa·1 events·first seen 15h ago

Aliases: REAR: Test-time Preference Realignment through Reward Decomposition

Co-occurring entities

More like this (12)

Recent events (1)

6arXiv · cs.CL·15h ago·source ↗

REAR: Test-time reward decomposition for preference realignment in LLMs

Researchers introduce REAR (REAlignment Reward), a training-free framework for aligning LLMs with diverse user preferences at test time. The method decomposes the reward function into question-related and preference-related components, then derives a realignment reward expressible as a linear combination of token-level log-probabilities. This formulation integrates cleanly with existing test-time scaling algorithms like best-of-N sampling and tree search, and experiments show it generalizes across preference alignment, math, and visual tasks.