paper

FlowEdit: Associative Memory for Lifelong Pronunciation Adaptation in Flow-Matching TTS

paperactiveprovisionalflowedit-associative-memory-for-lifelong-pronunciation-adaptation-in-flow-matching-tts-56e8206b·1 events·first seen 47h ago

Aliases: FlowEdit: Associative Memory for Lifelong Pronunciation Adaptation in Flow-Matching TTS

Co-occurring entities

Modern Hopfield Network FlowEdit

More like this (12)

CapSpeech-TTS SoundFlow Learning to Hear Hesitation: Continual Learning for Disfluency-Aware ASR Flow Matching Langflow FlowEdit DirectAudioEdit: Inversion-Free Text-Guided Audio Editing via Diffusion Prediction Contrast flow-matching decoder Connecting Speech to Words through Images How Do Instructions Shape Speech? Cross-Attention Attribution for Style-Captioned Text-to-Speech Speech-to-Speech language-adaptive switch

Recent events (1)

5arXiv · cs.AI·47h ago·source ↗

FlowEdit: Lifelong pronunciation adaptation for flow-matching TTS via associative memory

FlowEdit is a new framework enabling lifelong pronunciation correction in frozen flow-matching text-to-speech systems without retraining model weights. Corrections are stored as token-level perturbations in text embedding space within a Modern Hopfield Network, retrieved at inference via soft attention with fuzzy morphological matching. On a curated benchmark of 312 multilingual proper nouns across 18 language families, the method reduces target-word Phoneme Error Rate by 92.7% relative to the zero-shot baseline, with each correction completing in ~15 seconds on a single GPU.

Inference Economics Enterprise Deployment Patterns Modern Hopfield Network FlowEdit FlowEdit: Associative Memory for Lifelong Pronunciation Adaptation in Flow-Matching TTS