Almanac
paper

Acoustic Cue Alignment in Audio Language Models for Speech Emotion Recognition

paperactiveprovisionalacoustic-cue-alignment-in-audio-language-models-for-speech-emotion-recognition-c56f1d8e·1 events·first seen 9d ago

Aliases: Acoustic Cue Alignment in Audio Language Models for Speech Emotion Recognition

Co-occurring entities

More like this (12)

Recent events (1)

4arXiv · cs.CL·9d ago·source ↗

Acoustic cue alignment tokens improve speech emotion recognition in audio language models

Researchers study whether instruction-following audio language models (ALMs) use explicit acoustic cues in a grounded way when raw audio is already available. They derive six interpretable acoustic concept tokens from the eGeMAPS feature set and append them to text prompts, testing on FAU-Aibo and IEMOCAP benchmarks. Aligned tokens improve unweighted average recall while shuffled or corrupted tokens degrade performance, but models don't fully collapse under perturbation, indicating partial anchoring to the audio signal. The work offers a practical probing method for interpretability and robustness in affective computing with ALMs.