Almanac
paper

Speaker Group Encoding in Self-supervised Speech Recognition Models

paperactiveprovisionalspeaker-group-encoding-in-self-supervised-speech-recognition-models-f18dba7b·1 events·first seen 7d ago

Aliases: Speaker Group Encoding in Self-supervised Speech Recognition Models

More like this (12)

Recent events (1)

4arXiv · cs.CL·7d ago·source ↗

Study reveals how self-supervised speech models encode speaker group attributes across fine-tuning stages

Researchers investigate what self-supervised speech recognition models (S3Ms) learn about speaker group categories including gender, age, dialect, ethnicity, and native-speaker status across pretrained, SID-finetuned, ASR-finetuned, and fairness-enhanced states. They find that SID fine-tuning amplifies phonetically variant speaker group information while ASR fine-tuning discards it but retains semantically variant information. Fairness-enhancing ASR algorithms primarily affect phonetically variant speaker group encoding but have limited impact on semantically variant categories. The findings offer guidance for designing fairer ASR systems.