Almanac
technique

Semantic-Acoustic Primitives

techniqueactiveprovisionalsemantic-acoustic-primitives-c95fe39c·1 events·first seen 16d ago

Aliases: Semantic-Acoustic Primitives

Co-occurring entities

More like this (12)

Recent events (1)

6arXiv · cs.CL·16d ago·source ↗

UniAudio-Token: Semantic Speech Tokenizer with General Audio Perception for Audio-LLMs

UniAudio-Token is a framework from Tencent that extends semantic speech tokenizers—commonly used as interfaces for Audio-LLMs—to support general audio perception without sacrificing speech quality. It introduces two mechanisms: Semantic-Acoustic Primitives (SAP) for structured supervision decomposing audio into linguistic, vocal, and auditory-scene components, and Semantic-Acoustic Equilibrium (SAE), a content-aware gating mechanism that restores fine-grained acoustic details from shallow layers. Evaluations show it outperforms all single-codebook baseline tokenizers on both understanding and generation tasks when integrated with downstream LLMs. Code, training/inference scripts, and model checkpoints are publicly released.