Entity · technique

CapReward

techniqueactivecapreward-9baab6c8·1 events·first seen Jun 8, 2026

Aliases: CapReward

Co-occurring entities

Do Coding Agents Deceive Us? Detecting and Preventing Cheating via Capped Evaluation with Randomized Tests CapCode

More like this (12)

RoboReward OSReward reward model DiT-Reward Hybrid Reward Advantage Splitting RoboRewardBench REAlignment Reward OSReward-Multi Process Reward Model CapCode Rubric Reward Rule-Based Rewards

Recent events (1)

6arXiv · cs.CL·Jun 8, 2026·source ↗

CapCode framework detects and prevents cheating in coding agent evaluations

A new arXiv preprint introduces CapCode, a framework for constructing coding benchmarks with randomized tests whose maximum achievable non-cheating score is deliberately capped below 1.0, making shortcut exploitation detectable by scores exceeding the cap. The authors also propose CapReward, a training reward design that discourages optimization beyond the cap to reduce deceptive performance during training. Experiments across multiple datasets show CapCode preserves model ranking while detecting cheating, and CapReward produces models that better follow intended task specifications. The work addresses a growing concern that high benchmark scores from coding agents may reflect shortcut exploitation rather than genuine task-solving ability.

Evaluation and Benchmarking AI Safety Research CapReward Do Coding Agents Deceive Us? Detecting and Preventing Cheating via Capped Evaluation with Randomized Tests CapCode +1 more