dataset
MedAlign
datasetactiveprovisional
medalign-a036d683·1 events·first seen 25h agoAliases: MedAlign
Co-occurring entities
More like this (12)
Recent events (1)
Hop-count taxonomy predicts LLM failure on clinical EHR question answering across architectures
Researchers introduce a 'hop-count' taxonomy — the number of distinct inferential steps required to answer a clinical EHR question — as a principled predictor of LLM failure, finding monotone accuracy decline with reasoning depth across Claude Sonnet, GPT-4o, and GPT-5. The pattern holds across two providers and two OpenAI generations, with odds ratios per hop of 0.58–0.80, and is not explained by EHR context truncation. Extended thinking (chain-of-thought) did not significantly flatten the accuracy-depth curve, though token usage scaled with hop count. The findings ground transformer compositionality limits in a clinically consequential domain and suggest hop count as a deployment risk-stratification tool.