Almanac
technique

pass@k

techniqueactivepass-k-3286a681·2 events·first seen 28d ago

Aliases: pass@k

Co-occurring entities

More like this (12)

Recent events (2)

8Openai Blog·28d ago·source ↗

Evaluating Large Language Models Trained on Code

OpenAI published research on evaluating large language models trained on code, introducing the Codex model and the HumanEval benchmark for assessing code generation capabilities. The work established foundational methodology for measuring functional correctness of code produced by LLMs using a pass@k metric. This paper became a landmark reference for code-focused LLM evaluation and influenced subsequent code generation research across the field.

7arXiv · cs.AI·26d ago·source ↗

Vector Policy Optimization: Training for Diversity Improves Test-Time Search

Vector Policy Optimization (VPO) is a new RL post-training algorithm for LLMs that replaces the scalar reward paradigm with vector-valued rewards, explicitly training models to produce diverse solution sets that specialize across different reward trade-offs. VPO is designed as a near-drop-in replacement for the GRPO advantage estimator and targets inference-scaling search procedures like AlphaEvolve. Across four tasks, VPO matches or outperforms scalar RL baselines on pass@k and best@k metrics, with advantages growing as search budget increases, and unlocks evolutionary search problems that GRPO-trained models cannot solve. The paper argues that diversity-optimized post-training may need to become the default as inference-time search becomes standard.