paper

An Empirical Analysis of Factual Errors in Human-Written Text and its Application

paperactiveprovisionalan-empirical-analysis-of-factual-errors-in-human-written-text-and-its-application-24738bb6·1 events·first seen 40h ago

Aliases: An Empirical Analysis of Factual Errors in Human-Written Text and its Application

Co-occurring entities

OpenAI GPT-5.5

More like this (12)

Beyond Accuracy: Community Perspectives on Machine Translation Expert-Aware Causal Tracing of Factual Recall in Sparse MoE Language Models Decomposing Factual Sycophancy in Language Models: How Size and Instruction Tuning Shape Robustness Artificial Analysis Conversational Dynamics Plausibility Evaluation From Texts to Scores: Tracing the Emergence of Essay Quality Representations in Large Language Models Words Speak Louder Than Code: Investigating Cognitive Heuristics in LLM-Based Code Vulnerability Detection Measuring Human Value Expression in Social Media Texts: Calibrated LLM Annotation and Encoder Transfer counterfactual text generation Reasoning over Grammar: Can Synthetic Linguistic Reasoning Traces Enhance Low-Resource Machine Translation?Text Analytics Evaluation Framework Artificial Analysis Text to Image

Recent events (1)

4arXiv · cs.CL·40h ago·source ↗

Empirical taxonomy of factual errors in human-written text reveals LLM detection gaps

A new arXiv paper introduces a taxonomy of factual error types in human-written text, derived from analysis of newspaper article corrections, identifying categories like kanji misconversions and numeral classifier errors absent from existing hallucination benchmarks. The authors evaluate several LLMs on Factual Error Detection (FED) tasks using both synthetic and real correction data. Even high-performance models like GPT-5.4 achieve only ~52% word-level F1 on synthetic data, underscoring the difficulty of detecting human-induced factual errors versus LLM hallucinations. The work highlights a neglected subproblem in factual accuracy research as the field has shifted focus toward LLM-generated hallucinations.

Evaluation and Benchmarking AI Safety Research An Empirical Analysis of Factual Errors in Human-Written Text and its Application OpenAI GPT-5.5