CUNI submits 1B-parameter simultaneous speech translation system to IWSLT 2026
Researchers from CUNI submit a simultaneous speech translation system to the IWSLT 2026 shared task, built on the offline Canary model with the AlignAtt policy. The system covers Czech-English and English-German/Italian translation pairs, supports 25 source and 25 target languages, and outperforms similarly sized baselines in both low- and high-latency regimes. At 1B parameters, it is positioned as a compact, multilingual, computationally efficient solution.
Related guides (1)
Related events (8)
AlignAtt4LLM adapts simultaneous speech translation policy to decoder-only LLMs for IWSLT 2026
Researchers present AlignAtt4LLM, a simultaneous speech translation system for IWSLT 2026 covering English to German, Italian, and Chinese. The system cascades Qwen3-ASR for incremental transcription with Gemma-4 E4B-it for translation, applying a novel AlignAtt policy adapted for decoder-only LLMs that lack encoder-decoder cross-attention. Key contributions include explicit source span prompting, offline alignment head selection, and query/key capture to recover a usable attention-based read/write policy. The system outperforms IWSLT 2026 baselines for European language pairs in both low- and high-latency regimes.
KIT submission to IWSLT 2026 cross-lingual voice cloning track with language tag prompting and RL fine-tuning
Researchers from KIT describe their system for the IWSLT 2026 Cross-Lingual Voice Cloning shared task, which aims to synthesize speech in a target language while preserving source-speaker identity. The system builds on FishAudio-S2-Pro, a multilingual TTS model, and introduces language tag prompting to reduce accent leakage, RL fine-tuning for intelligibility, and a reference-conditioned lexical matching method for domain-specific pronunciation. Language prompting yields the largest gains; lexical matching provides consistent improvements on matched subsets.
ServiceNow AI benchmarks frontier ASR systems on code-switched bilingual speech
ServiceNow AI published a benchmarking study evaluating frontier automatic speech recognition (ASR) systems on code-switched speech, where speakers alternate between two languages mid-conversation. The work targets a practical gap in voice agent deployments serving bilingual customer populations. Results assess how well current ASR models handle this linguistically complex scenario, with implications for enterprise voice AI reliability.
UniCAD: Unified benchmark and multimodal LLM for multi-task CAD learning
Researchers introduce UniCAD, a comprehensive benchmark for multi-modal CAD learning covering point-to-CAD reconstruction, text/image-to-CAD generation, and CAD question answering. Alongside the benchmark, they present UniCAD-MLLM, a single end-to-end multimodal large language model that ingests text, images, sketches, and point clouds to perform all these tasks. The system achieves state-of-the-art results on both UniCAD and Fusion360 benchmarks, outperforming task-specific and multi-task baselines. Dataset, code, and pretrained models are to be released.
Benchmarking Local LLMs for Confidential Translation Workflows
This paper evaluates locally runnable LLMs (via Ollama) for offline, privacy-constrained translation workflows targeting freelance translators and smaller language service providers. The authors expand their Reeve Foundation corpus to include German and Simplified Chinese, then benchmark local models across four language directions against commercial NMTs (DeepL, Baidu), a frontier LLM (GPT-5.2), and professional local NMT systems. Results show substantial performance variation by language direction and model size, with the best local LLMs matching or exceeding local NMT systems and the frontier LLM, though falling short of top commercial NMTs. The study supports the viability of local LLMs for confidentiality-sensitive translation use cases.
Qwen-MT Turbo: Alibaba Releases Specialized Translation Model Supporting 92 Languages
Alibaba's Qwen team has released qwen-mt-turbo, a specialized machine translation model built on Qwen3 and trained on trillions of multilingual and translation tokens. The model supports 92 languages and dialects covering over 95% of the global population. It incorporates reinforcement learning techniques to improve translation accuracy and linguistic fluency, and is available via the Qwen API.
Thinking Machines' TML-Interaction-Small 276B-A12B Advances SOTA Realtime Voice and VAD
Thinking Machines has released TML-Interaction-Small, a 276B-A12B mixture-of-experts model targeting native interaction capabilities including realtime voice. The model is reported to advance state-of-the-art in realtime voice interaction and supersedes standard voice activity detection (VAD) approaches. The item is a brief AINews digest entry from Latent Space with minimal technical detail beyond the headline claims.
BenCzechMark: A Benchmark for Evaluating LLM Czech Language Understanding
BenCzechMark is a new evaluation benchmark designed to assess large language model performance on Czech language tasks. The benchmark addresses the gap in non-English language evaluation, providing a structured way to measure LLM capabilities in Czech across multiple task types. Published on Hugging Face, it contributes to the growing ecosystem of multilingual and language-specific benchmarks.
