Entity · benchmark

GIM (Grounded Integration Measure)

benchmarkactivegim-grounded-integration-measure--1c8b5521·1 events·first seen May 19, 2026

Aliases: GIM (Grounded Integration Measure)

Co-occurring entities

2-Parameter Logistic IRT Model test-time compute HLE GPQA ARC-AGI

More like this (12)

GIC Inertial Measurement Unit (IMU)GIDD IMO GGML GPIC benchmark NBIM AGI IVGT GIFT-Eval Expected Improvement FigSIM

Recent events (1)

6arXiv · cs.CL·May 19, 2026·source ↗

GIM: A Grounded Integration Measure Benchmark for Evaluating Multi-Domain Cognitive Coordination in LLMs

The Grounded Integration Measure (GIM) is a new LLM benchmark of 820 original problems designed to resist benchmark saturation by requiring integration of multiple cognitive operations—constraint satisfaction, state tracking, epistemic vigilance, audience calibration—over broadly accessible knowledge. Unlike knowledge-escalation benchmarks (GPQA, HLE) or pure abstraction benchmarks (ARC-AGI), GIM grounds reasoning in realistic tasks without gating on specialized expertise. The authors calibrate a 2-parameter logistic IRT model over 200k+ prompt-response pairs across 28 models and 47 test configurations, producing the most extensive published study of test-time compute vs. model capability tradeoffs on a fixed benchmark. A key finding is that within-family configuration choices (thinking budget, quantization) matter as much as model selection.

Frontier Model Releases Evaluation and Benchmarking 2-Parameter Logistic IRT Model GIM (Grounded Integration Measure)test-time compute +4 more