Entity · paper

Acoustic Cue Alignment in Audio Language Models for Speech Emotion Recognition

paperactiveacoustic-cue-alignment-in-audio-language-models-for-speech-emotion-recognition-c56f1d8e·1 events·first seen Jun 8, 2026

Aliases: Acoustic Cue Alignment in Audio Language Models for Speech Emotion Recognition

Co-occurring entities

FAU-Aibo IEMOCAP eGeMAPS

More like this (12)

Auditing Protocol-Level Shortcuts in Large Audio Language Model Judges for Speech Evaluation Multi-Faceted Interactivity Alignment in Full-Duplex Speech Models Speaker Group Encoding in Self-supervised Speech Recognition Models Self-supervised Speech Comparison for L2 Phone, Rhythm, and Intonation Scoring Contrastive-Difference CKA Reveals Concept-Specific Structural Alignment Across Language Model Architectures Audio Interaction Model Cross-Modal Masking for Robust Silent Speech Synthesis Using sEMG and Lipreading Beyond task performance: Decoding bioacoustic embeddings with speech features Which Speech Representation Better Matches Text-Native Reasoning? A Study of Speech-Text Alignment on Frame Rate and Representation Actionable Activation Directions for Detecting and Mitigating Emergent Misalignment Across Language Model Families Semantic-Acoustic Equilibrium Leveraging Audio-LLMs to Filter Speech-to-Speech Training Data

Recent events (1)

4arXiv · cs.CL·Jun 8, 2026·source ↗

Acoustic cue alignment tokens improve speech emotion recognition in audio language models

Researchers study whether instruction-following audio language models (ALMs) use explicit acoustic cues in a grounded way when raw audio is already available. They derive six interpretable acoustic concept tokens from the eGeMAPS feature set and append them to text prompts, testing on FAU-Aibo and IEMOCAP benchmarks. Aligned tokens improve unweighted average recall while shuffled or corrupted tokens degrade performance, but models don't fully collapse under perturbation, indicating partial anchoring to the audio signal. The work offers a practical probing method for interpretability and robustness in affective computing with ALMs.

Evaluation and Benchmarking Multimodal Progress FAU-Aibo Acoustic Cue Alignment in Audio Language Models for Speech Emotion Recognition IEMOCAP +1 more