Almanac
technique

Conformal Decision Theory

techniqueactiveprovisionalconformal-decision-theory-8794f69c·1 events·first seen 20d ago

Aliases: Conformal Decision Theory

Co-occurring entities

More like this (12)

Recent events (1)

7arXiv · cs.AI·20d ago·source ↗

Calibrated Collective Oversight (CCO): Scalable Oversight with Finite-Time Statistical Guarantees

This paper introduces Calibrated Collective Oversight (CCO), a framework for maintaining human oversight of agentic AI systems that may exceed human capabilities. CCO aggregates diverse scoring functions into a conservatism penalty inspired by Attainable Utility Preservation, then calibrates this penalty online via Conformal Decision Theory to ensure undesirable outcomes stay below a user-specified threshold with finite-time bounds and no distributional assumptions. Evaluated on a modified SWE-bench (adversarially misaligned agent) and MACHIAVELLI (ethical violations), CCO allows weaker overseers to constrain stronger agents while preserving reward, with empirical violation rates closely matching specified targets.