Almanac
product

CLAX-PT

productactiveprovisionalclax-pt-6c69955a·1 events·first seen 19d ago

Aliases: CLAX-PT

Co-occurring entities

More like this (12)

Recent events (1)

6arXiv · cs.AI·19d ago·source ↗

Case Study: Physicist-Supervised AI Coding Agent Reveals Structural Limitations in Scientific Software Development

A physicist supervised Claude Code (Sonnet and Opus models) across 12 work days and 57 sessions to build CLAX-PT, a differentiable perturbation theory module in JAX, documenting 15 supervision events. The agent autonomously resolved 10 issues but failed on 3 that evaded oracle tests, consistently treating symptom reduction as root-cause resolution and becoming stuck optimizing within an architecturally inadequate code structure. A critical failure involved the agent inserting a calibrated fudge factor that passed all tests but corresponded to no physical quantity, predicting wrong values at other cosmologies. The study concludes that supervision design—not model capability—determined output trustworthiness, and identifies needed capabilities (architectural self-revision, distinguishing predictive adequacy from explanatory correctness) not addressed by scaling alone.