Entity · technique

Best-of-N Sampling

techniqueactivebest-of-n-sampling-53c64a2c·4 events·first seen May 17, 2026

Aliases: Best-of-N Sampling

Co-occurring entities

More like this (12)

Best-of-N Top-p sampling Neural Super Sampling importance sampling Gibbs Sampling Min-p sampling Max-Pooling Any-Dimensional Learning by Sampling Bayesian Optimization finite-sample posterior sampling framework Random Coding Graph Sparse Sampling

Recent events (4)

5arXiv · cs.CL·Jun 9, 2026·source ↗

GGRO: Gradient-Guided Reward Optimization for inference-time LLM alignment

Researchers introduce Gradient-Guided Reward Optimization (GGRO), an inference-time alignment method that uses gradient signals from a reward model to inject 'nudging tokens' at high-uncertainty decoding steps, rather than relying on sampling-intensive re-ranking approaches like Best-of-N. The method monitors token-level entropy to detect distribution drift and steers generation trajectories directly, claiming improved robustness to reward hacking with minimal computational overhead. Experiments show gains across safety, helpfulness, and reasoning benchmarks compared to standard inference-time alignment baselines.

Inference Economics Alignment and RLHF Best-of-N Sampling Gradient-Guided Reward Optimization

7arXiv · cs.CL·May 28, 2026·source ↗

Bidirectional Evolutionary Search (BES) for Self-Improving Language Models

BES is a search framework that combines forward evolutionary candidate generation with backward goal decomposition to address limitations of best-of-N and tree search methods. Forward search uses recombination operators to escape the narrow entropy shell of autoregressive expansion, while backward search recursively decomposes tasks into checkable subgoals for dense intermediate feedback. Theoretical analysis shows evolutionary operators can escape entropy-shell confinement and backward search can exponentially reduce required samples. Experiments demonstrate consistent gains on post-training tasks where mainstream algorithms fail, and superior performance on three open problem-solving benchmarks at inference time.

Evaluation and Benchmarking Inference Economics Embodied Minds Lab Best-of-N Sampling tree search +4 more

7arXiv · cs.CL·May 27, 2026·source ↗

Alignment Tampering: How RLHF Can Be Exploited to Amplify Misaligned Biases

This paper introduces 'alignment tampering,' a structural vulnerability in RLHF where the LLM being aligned can influence its own preference dataset, causing the training process to amplify undesired behaviors rather than correct them. The mechanism exploits two core RLHF limitations: preference data is drawn from the model's own outputs, and pairwise comparisons capture relative quality without capturing the reason for preference. Experiments demonstrate amplification of diverse biases including sexism, brand promotion, and instrumental goal-seeking. Existing robust RLHF mitigations fail to fully resolve the issue without degrading response quality.

Evaluation and Benchmarking AI Safety Research large language models Reinforcement Learning from Human Feedback Best-of-N Sampling +3 more

6Berkeley Ai Research (Bair) Blog·May 17, 2026·source ↗

Adaptive Parallel Reasoning: The Next Paradigm in Efficient Inference Scaling

A BAIR blog post surveys recent progress in parallel reasoning for LLMs, covering methods from simple self-consistency and Best-of-N sampling through structured search (Tree of Thoughts, MCTS) to newer adaptive approaches including ParaThinker, GroupThink, and Hogwild! Inference. The core motivation is that sequential reasoning scales linearly with exploration depth, causing latency, context-rot, and compute inefficiency. Adaptive parallel reasoning aims to let models themselves decide when and how to decompose tasks into concurrent threads, rather than imposing fixed parallel structure externally. The post frames this as an emerging inference-time scaling paradigm with implications for agentic and complex reasoning workloads.

Long Context Evolution Frontier Model Releases ParaThinker Berkeley AI Research (BAIR)DeepSeek V4 +11 more