paper
FlowEdit: Associative Memory for Lifelong Pronunciation Adaptation in Flow-Matching TTS
paperactiveprovisional
flowedit-associative-memory-for-lifelong-pronunciation-adaptation-in-flow-matching-tts-56e8206b·1 events·first seen 47h agoAliases: FlowEdit: Associative Memory for Lifelong Pronunciation Adaptation in Flow-Matching TTS
Co-occurring entities
More like this (12)
CapSpeech-TTSSoundFlowLearning to Hear Hesitation: Continual Learning for Disfluency-Aware ASRFlow MatchingLangflowFlowEditDirectAudioEdit: Inversion-Free Text-Guided Audio Editing via Diffusion Prediction Contrastflow-matching decoderConnecting Speech to Words through ImagesHow Do Instructions Shape Speech? Cross-Attention Attribution for Style-Captioned Text-to-SpeechSpeech-to-Speechlanguage-adaptive switch
Recent events (1)
FlowEdit: Lifelong pronunciation adaptation for flow-matching TTS via associative memory
FlowEdit is a new framework enabling lifelong pronunciation correction in frozen flow-matching text-to-speech systems without retraining model weights. Corrections are stored as token-level perturbations in text embedding space within a Modern Hopfield Network, retrieved at inference via soft attention with fuzzy morphological matching. On a curated benchmark of 312 multilingual proper nouns across 18 language families, the method reduces target-word Phoneme Error Rate by 92.7% relative to the zero-shot baseline, with each correction completing in ~15 seconds on a single GPU.