Almanac
technique

Clopper-Pearson

techniqueactiveprovisionalclopper-pearson-1a2442a8·1 events·first seen 7d ago

Aliases: Clopper-Pearson

Co-occurring entities

More like this (12)

Recent events (1)

7arXiv · cs.AI·7d ago·source ↗

Co-failure ceiling theorem bounds maximum gains from LLM routing, voting, and mixture-of-agents across 67 frontier models

A new arXiv paper introduces the concept of a 'co-failure ceiling' — the rate at which all models in an ensemble fail on the same query — and proves that no routing, voting, or cascade policy can exceed accuracy of (1 - beta) where beta is this all-wrong rate. Empirically evaluated across 67 models from 21 providers, the paper finds that standard pairwise error correlation metrics systematically underprice the co-failure tail by ~2.5x on open-ended mathematics, and that combining models rarely beats the single best model without strong query-level routing signals. The work provides a finite-sample certificate (via Clopper-Pearson bounds) for the maximum achievable gain from multi-model systems before training a router, and identifies answer format rather than subject matter as a key driver of co-failure on GPQA-Diamond.