Almanac
technique

Entropy-Cut Metropolis-Hastings

techniqueactiveprovisionalentropy-cut-metropolis-hastings-60346d0a·1 events·first seen 19d ago

Aliases: Entropy-Cut Metropolis-Hastings

Co-occurring entities

More like this (12)

Recent events (1)

7arXiv · cs.LG·19d ago·source ↗

Entropy-Cut Metropolis-Hastings: Sampling-Based Reasoning Without RL Training

This paper introduces Entropy-Cut Metropolis-Hastings (ECMH), an algorithm that samples from a 'power distribution' over base language model outputs to elicit strong reasoning without reinforcement learning posttraining. Rather than cutting reasoning traces at uniformly random positions, ECMH uses next-token entropy as a proxy to identify consequential decision points (e.g., choice of proof strategy), then resamples from those positions. The authors prove that mixing time scales with the number of decisions rather than tokens, and demonstrate consistent improvements over RL-trained models on MATH500, HumanEval, GPQA Diamond, and AIME26.