Entity · technique

pass@k

techniqueactivepass-k-3286a681·2 events·first seen May 20, 2026

Aliases: pass@k

Co-occurring entities

GRPO AlphaEvolve best@k Vector Policy Optimization GPT-3 OpenAI HumanEval Codex

More like this (12)

Pass@1 best@k Kilo Code KV Cache KVPress swap-KL K-Search KPMG SkillGate P-K-GCN backdoor attack GATE

Recent events (2)

7arXiv · cs.AI·May 22, 2026·source ↗

Vector Policy Optimization: Training for Diversity Improves Test-Time Search

Vector Policy Optimization (VPO) is a new RL post-training algorithm for LLMs that replaces the scalar reward paradigm with vector-valued rewards, explicitly training models to produce diverse solution sets that specialize across different reward trade-offs. VPO is designed as a near-drop-in replacement for the GRPO advantage estimator and targets inference-scaling search procedures like AlphaEvolve. Across four tasks, VPO matches or outperforms scalar RL baselines on pass@k and best@k metrics, with advantages growing as search budget increases, and unlocks evolutionary search problems that GRPO-trained models cannot solve. The paper argues that diversity-optimized post-training may need to become the default as inference-time search becomes standard.

Evaluation and Benchmarking Inference Economics GRPO pass@k AlphaEvolve +4 more

8Openai Blog·May 20, 2026·source ↗

Evaluating Large Language Models Trained on Code

OpenAI published research on evaluating large language models trained on code, introducing the Codex model and the HumanEval benchmark for assessing code generation capabilities. The work established foundational methodology for measuring functional correctness of code produced by LLMs using a pass@k metric. This paper became a landmark reference for code-focused LLM evaluation and influenced subsequent code generation research across the field.

Frontier Model Releases Evaluation and Benchmarking GPT-3 pass@k OpenAI +3 more