Almanac
technique

Model Merging / Weight Interpolation

techniqueactiveprovisionalmodel-merging-weight-interpolation-0500774e·1 events·first seen 20d ago

Aliases: Model Merging / Weight Interpolation

Co-occurring entities

More like this (12)

Recent events (1)

6arXiv · cs.AI·20d ago·source ↗

Extrapolative Weight Averaging Reveals Correctness-Efficiency Frontiers in Code RL

This paper investigates whether extrapolative weight averaging of RL-trained checkpoints can extend Pareto frontiers between competing objectives (correctness vs. computational efficiency) without additional training. Starting from a shared initialization, the authors train checkpoints under nested unit-test coverage regimes for competitive programming tasks, revealing a correctness-efficiency frontier where higher-coverage rewards reduce optimization failures but increase correctness failures. Extrapolation beyond trained endpoints produces complementary policies that, when ensembled, improve pass@250 on LCB/hard by 3.3% over the best single checkpoint at matched sample budget. Results hold across 7B and 32B model scales and three inference settings: pure reasoning, tool use, and agentic coding.