Almanac
paper

Connecting Speech to Words through Images

paperactiveprovisionalconnecting-speech-to-words-through-images-84687ed7·1 events·first seen 25h ago

Aliases: Connecting Speech to Words through Images

More like this (12)

Recent events (1)

4arXiv · cs.CL·25h ago·source ↗

Visually grounded method connects spoken words to written text without textual supervision

Researchers present a method for learning mappings between written words and spoken audio using only images and spoken captions, with no explicit text supervision. The approach uses image captioning to build a written vocabulary, then applies unsupervised word discovery to align spoken utterances to those words. The system outperforms a neural baseline on spoken word retrieval and keyword spotting tasks in English, with implications for low-resource language processing.