benchmark
CRAG
benchmarkactiveprovisional
crag-9bdd0f96·1 events·first seen 2d agoAliases: CRAG
Co-occurring entities
More like this (12)
Recent events (1)
Tool-intent stabilization analysis quantifies when streaming RAG latency hiding is possible
A new arXiv paper introduces 'tool-intent stabilization' — the point in a streaming input at which a speculative retrieval query converges to the correct result — and measures its distribution on the CRAG benchmark (1,371 questions). The authors derive a model-agnostic bound on how much tool latency can be hidden behind remaining user input, finding that at realistic operating parameters 73.9% of queries admit substantial latency hiding. The study requires no model training and validates the bound against a working streaming pipeline, also identifying query properties that predict early versus late stabilization.