Almanac
paper

Interleaved Speech Language Models Latently Work In Text

paperactiveprovisionalinterleaved-speech-language-models-latently-work-in-text-9c14b352·1 events·first seen 40h ago

Aliases: Interleaved Speech Language Models Latently Work In Text

Co-occurring entities

More like this (12)

Recent events (1)

5arXiv · cs.CL·40h ago·source ↗

Interleaved speech-text LMs implicitly transcribe speech in intermediate layers before predicting in text space

A new arXiv paper analyzes the internal mechanisms of interleaved speech-text language models using the logit lens, revealing that these models undergo an implicit transcription phase in intermediate layers where the text token of a spoken word becomes decodable despite no explicit speech recognition training. This transcription appears as a top candidate word for up to 77% of the data, after which the model predicts the next word in text space before converting back to speech. The findings illuminate how speech and text modalities interact in the latent space of SLMs and have implications for optimizing speech language model training.