Entity · model

Qwen3.5 Omni

modelactiveqwen3-5-omni-a943bd43·5 events·first seen May 23, 2026

Aliases: Qwen3.5 Omni, Qwen3-Omni, Qwen3.5 Omni Plus

Co-occurring entities

More like this (12)

Qwen3.5 Omni Flash Qwen2.5-Omni Qwen3.5-Plus Qwen3-Omni-Thinking Qwen3.5 Small Qwen 3.5 Qwen3.6-Plus Qwen3.7-Plus Qwen3.5 MoE Qwen 3.7 Max Qwen 3.5 27B Qwen2.5-3B

Recent events (5)

6arXiv · cs.LG·Jul 21, 2026·source ↗

FlashRT: Agent harness that converts reference implementations into optimized multi-GPU deployments for real-time multimodal apps

FlashRT is an agent harness that guides coding agents to transform developer-written reference implementations of real-time multimodal pipelines (voice agents, video generation, multimodal LLMs) into optimized multi-GPU deployments. Using a 'chain-of-program' paradigm, the system directs an agent through IR construction, static analysis, and a measurement-gated optimization loop to tune placement, streaming, and parallelism. Across benchmarks on NVIDIA B200 and AMD MI355X GPUs, FlashRT achieves up to ~70x latency reduction and 3.6x throughput improvement, and outperforms the expert vLLM-Omni implementation for Qwen3-Omni text-to-audio inference on AMD hardware. The result suggests agent-driven optimization may be especially valuable on platforms lacking mature expert tooling.

Inference Economics Agent and Tool Ecosystem NVIDIA B200 Qwen3.5 Omni vllm-omni +3 more

7The Batch·Jul 8, 2026·source ↗

GPT-5.6 wider API release imminent after government delay; roundup covers Microsoft MAI shift, Claude Cowork mobile, Nvidia Audex, OpenAI mini voice

OpenAI's GPT-5.6 models are set for broader API release following a Department of Commerce-approved safety review that delayed launch for weeks; GPT-5.6 Sol Ultra scores 91.9% on TerminalBench 2.1 versus Claude Mythos 5 at 88%, with pricing roughly half of Anthropic's comparable tier. Microsoft is actively replacing OpenAI and Anthropic models in Excel, Outlook, and Teams with its internally built MAI models to reduce third-party dependency as its OpenAI discount partnership nears expiration. Anthropic expanded Claude Cowork to web and mobile for Max plan subscribers, with usage data from 1.2 million sessions showing over 90% of use is non-developer work. Nvidia released Audex, a 30B MoE audio-text model that avoids the typical 'text tax' of multimodal models, shipping under a noncommercial license.

Frontier Model Releases Inference Economics Claude Mythos Center for AI Standards and Innovation Microsoft +19 more

7arXiv · cs.CL·Jun 25, 2026·source ↗

Study finds real-time voice AI systems ignore vocal delivery cues despite perceiving them

A new arXiv paper evaluates four production real-time voice AI systems — OpenAI GPT Realtime 2, Google Gemini 3.1 Flash Live, Qwen3.5 Omni Plus, and Qwen3.5 Omni Flash — on tasks where vocal delivery (distress, fear, sarcasm) carries meaningful information distinct from word content. All four systems consistently act on words alone, ending calls with crying users who deny distress, approving frightened-voice wire transfers, and accepting sarcastic consent. Critically, three of four systems can correctly identify the emotional state when asked directly, revealing a gap between perception and decision-making the authors term the 'emotional intelligence gap.' Prompting systems to attend to vocal delivery improves performance only partially and inconsistently.

Evaluation and Benchmarking AI Safety Research Qwen3.5 Omni Flash GPT-Realtime-2 Google +6 more

6arXiv · cs.CL·Jun 1, 2026·source ↗

DOA: Training-Free Decoder-Only Attention Policy for Long-Form Simultaneous Speech Translation with SpeechLLMs

The paper proposes Decoder-Only Attention (DOA), a training-free streaming policy for simultaneous speech-to-text translation (SimulST) that works with off-the-shelf decoder-only Speech LLMs. DOA derives proxy alignment signals from self-attention rather than cross-attention, enabling long-form simultaneous translation without retraining. Experiments on Phi4-Multimodal and Qwen3-Omni demonstrate low-latency performance approaching offline decoding quality, validating that decoder self-attention contains sufficient alignment information for streaming decisions.

Long Context Evolution Inference Economics Phi4-Multimodal SpeechLLM Qwen3.5 Omni +3 more

7The Batch·May 23, 2026·source ↗

Thinking Machines Lab Reveals TML-Interaction-Small: Real-Time Multimodal Interaction Model

Thinking Machines Lab (founded by Mira Murati) has announced TML-Interaction-Small, a 276B-parameter mixture-of-experts multimodal model that processes audio, video, and text concurrently using 200ms 'micro-turns' rather than waiting for conversational turns to complete. The architecture uses encoder-free early fusion, pairing a fast foreground interaction model with an asynchronous background reasoning model that shares context. On interactivity benchmarks (FD-bench V1/V1.5), it outperforms GPT-Realtime-2 and Gemini-3.1-flash-live-preview, though it trails GPT-Realtime-2 on intelligence benchmarks. A closed research preview is expected in coming months with wider release later in 2026.

Frontier Model Releases Inference Economics encoder-free early fusion Thinking Machines GPT-Realtime-2 +16 more