Entity · paper

Speaker Group Encoding in Self-supervised Speech Recognition Models

paperactivespeaker-group-encoding-in-self-supervised-speech-recognition-models-f18dba7b·1 events·first seen Jun 10, 2026

Aliases: Speaker Group Encoding in Self-supervised Speech Recognition Models

More like this (12)

From Self-Supervised Speech Models to Mixture-of-Experts for Robust Anti-Spoofing speaker-attribute classification Interleaved Speech Language Models Latently Work In Text Leveraging Audio-LLMs to Filter Speech-to-Speech Training Data Acoustic Cue Alignment in Audio Language Models for Speech Emotion Recognition Sparse Autoencoders Self-Supervised Pretraining Cross-Modal Masking for Robust Silent Speech Synthesis Using sEMG and Lipreading Multi-Faceted Interactivity Alignment in Full-Duplex Speech Models Self-Supervised Learning encoder-only language models Beyond task performance: Decoding bioacoustic embeddings with speech features

Recent events (1)

4arXiv · cs.CL·Jun 10, 2026·source ↗

Study reveals how self-supervised speech models encode speaker group attributes across fine-tuning stages

Researchers investigate what self-supervised speech recognition models (S3Ms) learn about speaker group categories including gender, age, dialect, ethnicity, and native-speaker status across pretrained, SID-finetuned, ASR-finetuned, and fairness-enhanced states. They find that SID fine-tuning amplifies phonetically variant speaker group information while ASR fine-tuning discards it but retains semantically variant information. Fairness-enhancing ASR algorithms primarily affect phonetically variant speaker group encoding but have limited impact on semantically variant categories. The findings offer guidance for designing fairer ASR systems.

Evaluation and Benchmarking AI Safety Research Speaker Group Encoding in Self-supervised Speech Recognition Models