Almanac
product

MATCHA

productactiveprovisionalmatcha-abdc1813·1 events·first seen 21d ago

Aliases: MATCHA

Co-occurring entities

More like this (12)

Recent events (1)

6arXiv · cs.CL·21d ago·source ↗

MATCHA: Contrastive Semantic Alignment Metric for LLM Evaluation

MATCHA is a new automatic evaluation metric for LLMs that addresses a fundamental flaw in existing metrics: both token-overlap (ROUGE) and embedding-based (BERTScore) metrics routinely assign near-identical scores to semantically contradictory texts. The metric uses a dual-view approach that rewards proximity to a gold reference while penalizing adversarially generated counterfactual contradictions. Evaluated across eight benchmarks spanning QA, summarization, NLI, and semantic similarity tasks, MATCHA outperforms 23 embedding models and achieves 18.38% and 20.82% improvements over ROUGE-L and BERTScore respectively on TruthfulQA. Code and metric are publicly released.