Almanac
technique

K-means

techniqueactiveprovisionalk-means-8a417612·1 events·first seen 6d ago

Aliases: K-means

Co-occurring entities

More like this (12)

Recent events (1)

4arXiv · cs.CL·6d ago·source ↗

Study finds lower bitrate discrete speech representations sufficient for generative spoken language modeling

Researchers investigate how segmentation width and cluster size affect speech resynthesis and continuation quality in Generative Spoken Language Models (GSLM), which train language models on discrete speech units without text. They find that intelligible, natural speech can be synthesized at lower bitrates than the standard baseline, and that continuation quality remains stable at reduced bitrates, suggesting conventional GSLM settings may be over-specified. The paper also notes that LLM-based evaluation metrics correlate better with human judgments than conventional metrics, but correlation remains low, pointing to a gap in automatic evaluation for speech generation.