technique
DAAM
techniqueactiveprovisional
daam-2b097e45·1 events·first seen 2d agoAliases: DAAM
Co-occurring entities
More like this (12)
Recent events (1)
Cross-attention attribution reveals how natural language instructions shape speech diffusion model outputs
Researchers adapt the DAAM cross-attention attribution framework to speech diffusion models for the first time, applying it to CapSpeech-TTS to analyze how individual caption tokens influence acoustic output. The study analyzes 3,600 style-caption/transcript combinations across 25 layers and 24 ODE steps, producing per-token heatmaps. Key findings include that style tokens exhibit lower temporal variance than content tokens, style attention correlates with F0 and energy, and style conditioning peaks in early diffusion steps and deep layers. This is the first interpretability study of natural language conditioning in speech diffusion models.