product
MATCHA
productactiveprovisional
matcha-abdc1813·1 events·first seen 21d agoAliases: MATCHA
Co-occurring entities
More like this (12)
Recent events (1)
MATCHA: Contrastive Semantic Alignment Metric for LLM Evaluation
MATCHA is a new automatic evaluation metric for LLMs that addresses a fundamental flaw in existing metrics: both token-overlap (ROUGE) and embedding-based (BERTScore) metrics routinely assign near-identical scores to semantically contradictory texts. The metric uses a dual-view approach that rewards proximity to a gold reference while penalizing adversarially generated counterfactual contradictions. Evaluated across eight benchmarks spanning QA, summarization, NLI, and semantic similarity tasks, MATCHA outperforms 23 embedding models and achieves 18.38% and 20.82% improvements over ROUGE-L and BERTScore respectively on TruthfulQA. Code and metric are publicly released.