Entity · benchmark

LCB/hard benchmark

benchmarkactivelcb-hard-benchmark-95d75582·1 events·first seen May 28, 2026

Aliases: LCB/hard benchmark

Co-occurring entities

Competitive Programming RL LeetCode Hard (LCB/hard)Correctness-Efficiency Frontier 7B language model Model Merging / Weight Interpolation Reinforcement Learning for Code Extrapolative Weight Averaging Nested Unit-Test Coverage 32B language model

More like this (12)

LCB LeetCode Hard (LCB/hard)CORE benchmark LKvaluesBench Multi-LCB OpenSCAD Architectural 3D LLM Benchmark DPG Benchmark Hebbia Finance Benchmark harness-level benchmarks KernelBench SlopCodeBench LabBench

Recent events (1)

6arXiv · cs.AI·May 28, 2026·source ↗

Extrapolative Weight Averaging Reveals Correctness-Efficiency Frontiers in Code RL

This paper investigates whether extrapolative weight averaging of RL-trained checkpoints can extend Pareto frontiers between competing objectives (correctness vs. computational efficiency) without additional training. Starting from a shared initialization, the authors train checkpoints under nested unit-test coverage regimes for competitive programming tasks, revealing a correctness-efficiency frontier where higher-coverage rewards reduce optimization failures but increase correctness failures. Extrapolation beyond trained endpoints produces complementary policies that, when ensembled, improve pass@250 on LCB/hard by 3.3% over the best single checkpoint at matched sample budget. Results hold across 7B and 32B model scales and three inference settings: pure reasoning, tool use, and agentic coding.

Evaluation and Benchmarking Inference Economics LCB/hard benchmark Competitive Programming RL LeetCode Hard (LCB/hard)+9 more