technique
factorized FSQ
techniqueactiveprovisional
factorized-fsq-4388ad71·1 events·first seen 6d agoAliases: factorized FSQ
Co-occurring entities
More like this (12)
Recent events (1)
Study finds optimal speech token frame rate for aligning speech with text-native LLM reasoning
Researchers identify a temporal-granularity mismatch as a key cause of reasoning degradation in spoken dialogue models: speech tokens are far longer than text under matched semantics, diluting per-token semantic density. The paper introduces factorized FSQ and a non-autoregressive audio LM head to enable low frame rates, then sweeps frame rates from 50Hz down to 2.08Hz under a frozen LLM backbone. Results show a consistent optimal regime at 4.17Hz with intermediate-layer representation alignment for speech QA tasks.