TruthfulQA
truthfulqa-31167923·3 events·first seen 28d agoAliases: TruthfulQA
Co-occurring entities
More like this (12)
Recent events (3)
TruthfulQA: Measuring how models mimic human falsehoods
OpenAI introduced TruthfulQA, a benchmark designed to measure whether language models generate truthful answers or mimic common human misconceptions and falsehoods. The benchmark tests models on questions where humans frequently give wrong answers due to misconceptions, conspiracy theories, or false beliefs. Results showed that larger models were not necessarily more truthful, and in some cases performed worse, highlighting a key alignment challenge.
MATCHA: Contrastive Semantic Alignment Metric for LLM Evaluation
MATCHA is a new automatic evaluation metric for LLMs that addresses a fundamental flaw in existing metrics: both token-overlap (ROUGE) and embedding-based (BERTScore) metrics routinely assign near-identical scores to semantically contradictory texts. The metric uses a dual-view approach that rewards proximity to a gold reference while penalizing adversarially generated counterfactual contradictions. Evaluated across eight benchmarks spanning QA, summarization, NLI, and semantic similarity tasks, MATCHA outperforms 23 embedding models and achieves 18.38% and 20.82% improvements over ROUGE-L and BERTScore respectively on TruthfulQA. Code and metric are publicly released.
CHAIR: Supervised hallucination detection via internal logit analysis across LLM layers
A new arXiv preprint introduces CHAIR (Classifier of Hallucination As ImproveR), a supervised framework that detects hallucinations by extracting statistical features (max, min, mean, std, slope) from token logits across all layers of an LLM. Evaluated on TruthfulQA and MMLU, CHAIR shows improved detection accuracy especially in zero-shot settings. The authors argue the approach also points toward richer internal representations for designing adaptive decoding strategies that reduce hallucinations.