paper
When LLMs Read Tables Carelessly: Measuring and Reducing Data Referencing Errors
paperactiveprovisional
when-llms-read-tables-carelessly-measuring-and-reducing-data-referencing-errors-17ebb19e·1 events·first seen 2d agoAliases: When LLMs Read Tables Carelessly: Measuring and Reducing Data Referencing Errors
More like this (12)
Which Models Are Our Models Built On? Auditing Invisible Dependencies in Modern LLMsCan LLMs Judge Better Than They Generate? Evaluating Task Asymmetry, Mechanistic Interpretability and Transferability for In-Context QATowards Root Memories: Benchmarking and Enhancing Implicit Logical Memory Retrieval for Personalized LLMsMeasuring Epistemic Resilience of LLMs Under Misleading Medical ContextFlaws in the LLM Automation NarrativeLLM inferenceBeyond Third-Person Audits: Situated Interaction Auditing for User-Centered LLM Bias Researchdatasette-llmCan LLMs Reliably Self-Report Adversarial Prefills, and How?The Masked Advantage: Uncovering Local-Language Access to Cultural Knowledge in LLMsSurrogate Fidelity: When Can Open LLMs Explain Closed Ones?How reliable are LLMs when it comes to playing dice?
Recent events (1)
Systematic study of LLM data referencing errors in tables, with lightweight critic model mitigation
A new arXiv paper introduces the first systematic evaluation of data referencing errors (DREs) — incorrect citation or omission of table values — across LLMs ranging from 1.7B to 20B parameters. The authors find DREs are pervasive across all tested models and tasks, compromising intermediate reasoning steps beyond just final-answer accuracy. They demonstrate that a critic-based filtering and rejection sampling approach improves answer accuracy by up to 12%, and train a lightweight 4B critic model achieving 78.2% F1 on detecting DREs both in- and out-of-distribution.