paper
Multi-Faceted Interactivity Alignment in Full-Duplex Speech Models
paperactiveprovisional
multi-faceted-interactivity-alignment-in-full-duplex-speech-models-e42a2924·1 events·first seen 7d agoAliases: Multi-Faceted Interactivity Alignment in Full-Duplex Speech Models
Co-occurring entities
More like this (12)
Audio Interaction ModelAdaptive Turn-Taking for Real-time Multi-Party Voice AgentsAcoustic Cue Alignment in Audio Language Models for Speech Emotion Recognitionforeground-background dual-agent voice architectureModeling Complex Behaviors: Multi-Personality Composition and Dynamic Switching in Vision-Language ModelsExploring Adversarial Robustness and Safety Alignment in Multilingual Multi-Modal Large Language ModelsLeveraging Audio-LLMs to Filter Speech-to-Speech Training DataSpeaker Group Encoding in Self-supervised Speech Recognition ModelsBeyond Fully Random Masking: Attention-Guided Denoising and Optimization for Diffusion Language ModelsWhich Speech Representation Better Matches Text-Native Reasoning? A Study of Speech-Text Alignment on Frame Rate and RepresentationContrastive-Difference CKA Reveals Concept-Specific Structural Alignment Across Language Model ArchitecturesThe Neutral Mask: How RLHF Provides Shallow Alignment while Leaving Partisan Structure Intact in a Large Language Model
Recent events (1)
RL-based alignment improves interactivity in full-duplex spoken dialogue models
Researchers propose a post-training alignment method using reinforcement learning to improve interactivity in full-duplex spoken dialogue models, which can listen and speak simultaneously. The method addresses four canonical axes of interactivity—pause handling, turn-taking, backchanneling, and user interruption—each with axis-specific reward functions, plus an LLM-based reward to prevent semantic degradation. The approach is applied to two open-source models, Moshi and PersonaPlex, showing consistent improvements in both offline and real-time multi-turn evaluation.