Almanac
benchmark

KINA

benchmarkactiveprovisionalkina-f8032e66·1 events·first seen 13d ago

Aliases: KINA

Co-occurring entities

More like this (12)

Recent events (1)

6arXiv · cs.AI·13d ago·source ↗

KINA: 899-item knowledge benchmark across 261 disciplines with formal representativeness and annotation incentive guarantees

KINA (Knowledge Index of Noah's Ark) is a new 899-item LLM benchmark spanning 261 fine-grained disciplines, addressing three methodological weaknesses in existing knowledge benchmarks: poor disciplinary representativeness, flat-payment annotation incentives, and unaudited ranking instability. The authors provide formal results: a (1-1/e) greedy approximation for disciplinary coverage and a proof that bonus-on-bar tournament payment weakly dominates flat payment for annotation quality. Evaluating 42 models from 13 labs, the top performer Gemini-3.1-Pro-Preview reaches 53.17%, with Claude-Opus-4.6 and GPT-5.4 close behind, revealing a tiered rather than smooth leaderboard structure with substantial headroom below saturation.