Entity · product

MATCHA

productactivematcha-abdc1813·1 events·first seen May 27, 2026

Aliases: MATCHA

Co-occurring entities

TruthfulQA ROUGE-L Siran Li BERTScore contrastive semantic alignment

More like this (12)

KAFFEE Chai-1 Chai Asawa MATS Oolong CHARM MOJO CAMMAR TOFU MoCA MESA MATEO

Recent events (1)

6arXiv · cs.CL·May 27, 2026·source ↗

MATCHA: Contrastive Semantic Alignment Metric for LLM Evaluation

MATCHA is a new automatic evaluation metric for LLMs that addresses a fundamental flaw in existing metrics: both token-overlap (ROUGE) and embedding-based (BERTScore) metrics routinely assign near-identical scores to semantically contradictory texts. The metric uses a dual-view approach that rewards proximity to a gold reference while penalizing adversarially generated counterfactual contradictions. Evaluated across eight benchmarks spanning QA, summarization, NLI, and semantic similarity tasks, MATCHA outperforms 23 embedding models and achieves 18.38% and 20.82% improvements over ROUGE-L and BERTScore respectively on TruthfulQA. Code and metric are publicly released.

Evaluation and Benchmarking AI Safety Research TruthfulQA ROUGE-L Siran Li +3 more