Entity · benchmark

KINA

benchmarkactivekina-f8032e66·1 events·first seen Jun 4, 2026

Aliases: KINA

Co-occurring entities

Claude Opus 4.6 Google OpenAI Gemini-3.1-Pro GPT-5.5 Anthropic

More like this (12)

Chinese Mainland ChinaTalk PG-KINN China People's Republic of China KAISEN Chinese CLIP NoXi Mandarin Chinese LIMA CADE Peking University

Recent events (1)

6arXiv · cs.AI·Jun 4, 2026·source ↗

KINA: 899-item knowledge benchmark across 261 disciplines with formal representativeness and annotation incentive guarantees

KINA (Knowledge Index of Noah's Ark) is a new 899-item LLM benchmark spanning 261 fine-grained disciplines, addressing three methodological weaknesses in existing knowledge benchmarks: poor disciplinary representativeness, flat-payment annotation incentives, and unaudited ranking instability. The authors provide formal results: a (1-1/e) greedy approximation for disciplinary coverage and a proof that bonus-on-bar tournament payment weakly dominates flat payment for annotation quality. Evaluating 42 models from 13 labs, the top performer Gemini-3.1-Pro-Preview reaches 53.17%, with Claude-Opus-4.6 and GPT-5.4 close behind, revealing a tiered rather than smooth leaderboard structure with substantial headroom below saturation.

Frontier Model Releases Evaluation and Benchmarking Claude Opus 4.6 KINA Google +4 more