Entity · benchmark

MaFI

benchmarkactivemafi-ceea98c3·1 events·first seen Jun 8, 2026

Aliases: MaFI

Co-occurring entities

The Lipreading Gap: Do VSR Models Perceive Visual Speech Like Human Lipreaders?

More like this (12)

MAI PaFi OpenMAIC AFM 3 FAISS FigSIM CA-MHFA FMLM+MMAE FINO MNLI MemFT

Recent events (1)

5arXiv · cs.CL·Jun 8, 2026·source ↗

VSR models outperform humans on lipreading benchmarks but rely on language cues, not visual perception

A new arXiv paper compares three visual speech recognition (VSR) systems against human lipreaders on the MaFI dataset using word, character, phoneme, and viseme-level metrics. Despite higher overall accuracy, VSR models succeed and fail on different words than humans, and their errors are better explained by training word frequency than visual informativeness. A text-only n-gram baseline given minimal phoneme input rivals human performance, suggesting VSR systems primarily exploit language priors rather than genuine visual speech perception. The findings raise questions about whether benchmark-beating performance reflects the capability it purports to measure.

Evaluation and Benchmarking Multimodal Progress MaFI The Lipreading Gap: Do VSR Models Perceive Visual Speech Like Human Lipreaders?