Almanac
paper

All Smoke, No Alarm: Oracle Signals in Agent-Authored Test Code

paperactiveprovisionalall-smoke-no-alarm-oracle-signals-in-agent-authored-test-code-429b3dc1·1 events·first seen 5h ago

Aliases: All Smoke, No Alarm: Oracle Signals in Agent-Authored Test Code

Co-occurring entities

More like this (12)

Recent events (1)

6arXiv · cs.AI·5h ago·source ↗

Empirical study finds 80% of AI agent-authored test patches lack meaningful verification logic

A large-scale empirical study of 86,156 test-file patches from 33,596 agent-authored GitHub PRs finds that 80.2% contain weak or no explicit oracle signals — meaning they execute code without verifying behavior. The study covers five coding agents (OpenAI Codex, GitHub Copilot, Devin, Cursor, and Claude Code) across 2,807 repositories, and introduces a syntactic taxonomy of eight oracle signal categories. Despite lower raw merge rates, regression analysis shows strong oracles significantly improve merge likelihood (OR=1.28), suggesting current quality gates based on test-file presence substantially overestimate verification strength.