paper

When LLMs Read Tables Carelessly: Measuring and Reducing Data Referencing Errors

paperactiveprovisionalwhen-llms-read-tables-carelessly-measuring-and-reducing-data-referencing-errors-17ebb19e·1 events·first seen 2d ago

Aliases: When LLMs Read Tables Carelessly: Measuring and Reducing Data Referencing Errors

More like this (12)

Which Models Are Our Models Built On? Auditing Invisible Dependencies in Modern LLMs Can LLMs Judge Better Than They Generate? Evaluating Task Asymmetry, Mechanistic Interpretability and Transferability for In-Context QA Towards Root Memories: Benchmarking and Enhancing Implicit Logical Memory Retrieval for Personalized LLMs Measuring Epistemic Resilience of LLMs Under Misleading Medical Context Flaws in the LLM Automation Narrative LLM inference Beyond Third-Person Audits: Situated Interaction Auditing for User-Centered LLM Bias Research datasette-llm Can LLMs Reliably Self-Report Adversarial Prefills, and How?The Masked Advantage: Uncovering Local-Language Access to Cultural Knowledge in LLMs Surrogate Fidelity: When Can Open LLMs Explain Closed Ones?How reliable are LLMs when it comes to playing dice?

Recent events (1)

4arXiv · cs.AI·2d ago·source ↗

Systematic study of LLM data referencing errors in tables, with lightweight critic model mitigation

A new arXiv paper introduces the first systematic evaluation of data referencing errors (DREs) — incorrect citation or omission of table values — across LLMs ranging from 1.7B to 20B parameters. The authors find DREs are pervasive across all tested models and tasks, compromising intermediate reasoning steps beyond just final-answer accuracy. They demonstrate that a critic-based filtering and rejection sampling approach improves answer accuracy by up to 12%, and train a lightweight 4B critic model achieving 78.2% F1 on detecting DREs both in- and out-of-distribution.

Evaluation and Benchmarking Agent and Tool Ecosystem When LLMs Read Tables Carelessly: Measuring and Reducing Data Referencing Errors