paper
Acoustic Cue Alignment in Audio Language Models for Speech Emotion Recognition
paperactiveprovisional
acoustic-cue-alignment-in-audio-language-models-for-speech-emotion-recognition-c56f1d8e·1 events·first seen 9d agoAliases: Acoustic Cue Alignment in Audio Language Models for Speech Emotion Recognition
Co-occurring entities
More like this (12)
Multi-Faceted Interactivity Alignment in Full-Duplex Speech ModelsSpeaker Group Encoding in Self-supervised Speech Recognition ModelsContrastive-Difference CKA Reveals Concept-Specific Structural Alignment Across Language Model ArchitecturesAudio Interaction ModelCross-Modal Masking for Robust Silent Speech Synthesis Using sEMG and LipreadingBeyond task performance: Decoding bioacoustic embeddings with speech featuresWhich Speech Representation Better Matches Text-Native Reasoning? A Study of Speech-Text Alignment on Frame Rate and RepresentationSemantic-Acoustic EquilibriumLeveraging Audio-LLMs to Filter Speech-to-Speech Training DataExploring Adversarial Robustness and Safety Alignment in Multilingual Multi-Modal Large Language Modelscontrastive semantic alignmentspeaker-attribute classification
Recent events (1)
Acoustic cue alignment tokens improve speech emotion recognition in audio language models
Researchers study whether instruction-following audio language models (ALMs) use explicit acoustic cues in a grounded way when raw audio is already available. They derive six interpretable acoustic concept tokens from the eGeMAPS feature set and append them to text prompts, testing on FAU-Aibo and IEMOCAP benchmarks. Aligned tokens improve unweighted average recall while shuffled or corrupted tokens degrade performance, but models don't fully collapse under perturbation, indicating partial anchoring to the audio signal. The work offers a practical probing method for interpretability and robustness in affective computing with ALMs.