Almanac
technique

REAlignment Reward

techniqueactiveprovisionalrealignment-reward-eafd7dc0·1 events·first seen 16h ago

Aliases: REAlignment Reward

Co-occurring entities

More like this (12)

Recent events (1)

6arXiv · cs.CL·16h ago·source ↗

REAR: Test-time reward decomposition for preference realignment in LLMs

Researchers introduce REAR (REAlignment Reward), a training-free framework for aligning LLMs with diverse user preferences at test time. The method decomposes the reward function into question-related and preference-related components, then derives a realignment reward expressible as a linear combination of token-level log-probabilities. This formulation integrates cleanly with existing test-time scaling algorithms like best-of-N sampling and tree search, and experiments show it generalizes across preference alignment, math, and visual tasks.