technique
logit lens
techniqueactiveprovisional
logit-lens-33d546a8·1 events·first seen 2d agoAliases: logit lens
Co-occurring entities
More like this (12)
Recent events (1)
Interleaved speech-text LMs implicitly transcribe speech in intermediate layers before predicting in text space
A new arXiv paper analyzes the internal mechanisms of interleaved speech-text language models using the logit lens, revealing that these models undergo an implicit transcription phase in intermediate layers where the text token of a spoken word becomes decodable despite no explicit speech recognition training. This transcription appears as a top candidate word for up to 77% of the data, after which the model predicts the next word in text space before converting back to speech. The findings illuminate how speech and text modalities interact in the latent space of SLMs and have implications for optimizing speech language model training.