Almanac
technique

factorized FSQ

techniqueactiveprovisionalfactorized-fsq-4388ad71·1 events·first seen 6d ago

Aliases: factorized FSQ

Co-occurring entities

More like this (12)

Recent events (1)

5arXiv · cs.CL·6d ago·source ↗

Study finds optimal speech token frame rate for aligning speech with text-native LLM reasoning

Researchers identify a temporal-granularity mismatch as a key cause of reasoning degradation in spoken dialogue models: speech tokens are far longer than text under matched semantics, diluting per-token semantic density. The paper introduces factorized FSQ and a non-autoregressive audio LM head to enable low frame rates, then sweeps frame rates from 50Hz down to 2.08Hz under a frozen LLM backbone. Results show a consistent optimal regime at 4.17Hz with intermediate-layer representation alignment for speech QA tasks.