Entity · model

Moshi

modelactivemoshi-7ca126ac·3 events·first seen Jun 10, 2026

Aliases: Moshi

Co-occurring entities

More like this (12)

MOSS MOSAIC MOSI.AI MOJO Molmo MOOSE-Chem MOSS-TTS CMU-MOSEI MosaicLeaks AnyMo MiMo 2.5 MuSiQue

Recent events (3)

7The Batch·Jul 17, 2026·source ↗

OpenAI GPT-Live Pairs Full-Duplex Voice Models with GPT-5.5 Reasoning Backend

OpenAI released GPT-Live-1 and GPT-Live-1 mini on July 8, 2026, replacing Advanced Voice Mode with a full-duplex voice system that processes audio continuously and delegates harder queries to GPT-5.5 in the background. The architecture separates a real-time conversational voice model from a reasoning model, with user-selectable reasoning effort levels (Instant, Medium, High) routing to GPT-5.5 Instant or GPT-5.5 Thinking accordingly. Performance gains are substantial: GPQA scores jumped from 45.3% (AVM) to 84.2% (GPT-Live-1 at high reasoning), and BrowseComp improved from 0.7% to 75.2%. The system is live globally on iOS, Android, and ChatGPT.com for paid plans, though no developer API has shipped yet.

Frontier Model Releases Agent and Tool Ecosystem Thinking Machines GPT-Live ChatGPT +18 more

6arXiv · cs.CL·Jun 15, 2026·source ↗

BayLing-Duplex: Native full-duplex speech dialogue using a single autoregressive LLM

Researchers introduce BayLing-Duplex, a speech language model that achieves native full-duplex interaction — simultaneous listening and speaking — using a single autoregressive LLM with no auxiliary VAD or turn-taking module. Built by fine-tuning GLM-4-Voice on 400K samples plus a lightweight DPO stage, it reaches 92% turn-taking success and 100% interruption success on InstructS2S-Eval, and improves speech-response quality substantially over Moshi. The approach adds only special tokens to the standard vocabulary, making it portable across LLM architectures without architectural changes.

Frontier Model Releases Multimodal Progress BayLing-Duplex InstructS2S-Eval Direct Preference Optimization (DPO)+3 more

5arXiv · cs.CL·Jun 10, 2026·source ↗

RL-based alignment improves interactivity in full-duplex spoken dialogue models

Researchers propose a post-training alignment method using reinforcement learning to improve interactivity in full-duplex spoken dialogue models, which can listen and speak simultaneously. The method addresses four canonical axes of interactivity—pause handling, turn-taking, backchanneling, and user interruption—each with axis-specific reward functions, plus an LLM-based reward to prevent semantic degradation. The approach is applied to two open-source models, Moshi and PersonaPlex, showing consistent improvements in both offline and real-time multi-turn evaluation.

Alignment and RLHF Multimodal Progress Multi-Faceted Interactivity Alignment in Full-Duplex Speech Models PersonaPlex Moshi