Entity · technique

factorized FSQ

techniqueactivefactorized-fsq-4388ad71·1 events·first seen Jun 11, 2026

Aliases: factorized FSQ

Co-occurring entities

Which Speech Representation Better Matches Text-Native Reasoning? A Study of Speech-Text Alignment on Frame Rate and Representation

More like this (12)

DeltaFS SFQ-Agent FSE QR factorization with column pivoting Soft Q-Function Subquadratic Fisher-SEP FACTR 2 matrix factorization SQuAD FinQA FFHQ

Recent events (1)

5arXiv · cs.CL·Jun 11, 2026·source ↗

Study finds optimal speech token frame rate for aligning speech with text-native LLM reasoning

Researchers identify a temporal-granularity mismatch as a key cause of reasoning degradation in spoken dialogue models: speech tokens are far longer than text under matched semantics, diluting per-token semantic density. The paper introduces factorized FSQ and a non-autoregressive audio LM head to enable low frame rates, then sweeps frame rates from 50Hz down to 2.08Hz under a frozen LLM backbone. Results show a consistent optimal regime at 4.17Hz with intermediate-layer representation alignment for speech QA tasks.

Evaluation and Benchmarking Multimodal Progress Which Speech Representation Better Matches Text-Native Reasoning? A Study of Speech-Text Alignment on Frame Rate and Representation factorized FSQ