paper
Connecting Speech to Words through Images
paperactiveprovisional
connecting-speech-to-words-through-images-84687ed7·1 events·first seen 25h agoAliases: Connecting Speech to Words through Images
More like this (12)
Thinking-with-ImagesSpeech-to-Speechspeech-to-avatar systemsArtificial Analysis Text to Imageimage-semantic guided poetry detectionvision-language groundingtext-to-image modelsvisual language modelSubject-driven Image GenerationContrastive Language-Image Pretraining (CLIP)Vision-Language ModelsVision-Language-Action models
Recent events (1)
Visually grounded method connects spoken words to written text without textual supervision
Researchers present a method for learning mappings between written words and spoken audio using only images and spoken captions, with no explicit text supervision. The approach uses image captioning to build a written vocabulary, then applies unsupervised word discovery to align spoken utterances to those words. The system outperforms a neural baseline on spoken word retrieval and keyword spotting tasks in English, with implications for low-resource language processing.