Attainable Utility Preservation
attainable-utility-preservation-07148874·1 events·first seen 20d agoAliases: Attainable Utility Preservation
Co-occurring entities
More like this (12)
Recent events (1)
Calibrated Collective Oversight (CCO): Scalable Oversight with Finite-Time Statistical Guarantees
This paper introduces Calibrated Collective Oversight (CCO), a framework for maintaining human oversight of agentic AI systems that may exceed human capabilities. CCO aggregates diverse scoring functions into a conservatism penalty inspired by Attainable Utility Preservation, then calibrates this penalty online via Conformal Decision Theory to ensure undesirable outcomes stay below a user-specified threshold with finite-time bounds and no distributional assumptions. Evaluated on a modified SWE-bench (adversarially misaligned agent) and MACHIAVELLI (ethical violations), CCO allows weaker overseers to constrain stronger agents while preserving reward, with empirical violation rates closely matching specified targets.