Almanac
paper

Cross-Modal Masking for Robust Silent Speech Synthesis Using sEMG and Lipreading

paperactiveprovisionalcross-modal-masking-for-robust-silent-speech-synthesis-using-semg-and-lipreading-e11b3e7d·1 events·first seen 8d ago

Aliases: Cross-Modal Masking for Robust Silent Speech Synthesis Using sEMG and Lipreading

More like this (12)

Recent events (1)

4arXiv · cs.CL·8d ago·source ↗

Cross-modal masking framework improves silent speech synthesis from sEMG and lipreading

Researchers propose a masked multimodal speech synthesis framework that jointly trains on surface electromyography (sEMG) and video-based lipreading signals using modality masking to improve robustness to sensor failure or degradation. In multispeaker settings, the approach reduces word error rate by up to 14 absolute percentage points over the strongest unimodal baseline. Masking strategies outperform degradation-specific data augmentation for handling missing modalities, with phone-level analysis revealing complementary contributions across vowels and consonant groups.