3arXiv cs.CL (Computation and Language)·17d ago

CUNI submits 1B-parameter simultaneous speech translation system to IWSLT 2026

Researchers from CUNI submit a simultaneous speech translation system to the IWSLT 2026 shared task, built on the offline Canary model with the AlignAtt policy. The system covers Czech-English and English-German/Italian translation pairs, supports 25 source and 25 target languages, and outperforms similarly sized baselines in both low- and high-latency regimes. At 1B parameters, it is positioned as a compact, multilingual, computationally efficient solution.

Multimodal Progress IWSLT 2026 Canary Charles University (CUNI)AlignAtt

Related guides (1)

Multimodal ProgressTopic guide

Multimodal Progress: How AI Learned to See, Hear, and Act

Read asBeginner In-depth

Related events (8)

4arXiv · cs.CL·17d ago·source ↗

AlignAtt4LLM adapts simultaneous speech translation policy to decoder-only LLMs for IWSLT 2026

Researchers present AlignAtt4LLM, a simultaneous speech translation system for IWSLT 2026 covering English to German, Italian, and Chinese. The system cascades Qwen3-ASR for incremental transcription with Gemma-4 E4B-it for translation, applying a novel AlignAtt policy adapted for decoder-only LLMs that lack encoder-decoder cross-attention. Key contributions include explicit source span prompting, offline alignment head selection, and query/key capture to recover a usable attention-based read/write policy. The system outperforms IWSLT 2026 baselines for European language pairs in both low- and high-latency regimes.

Evaluation and Benchmarking Multimodal Progress Gemma-4 E4B-it IWSLT 2026 AlignAtt +2 more

3arXiv · cs.CL·12d ago·source ↗

KIT submission to IWSLT 2026 cross-lingual voice cloning track with language tag prompting and RL fine-tuning

Researchers from KIT describe their system for the IWSLT 2026 Cross-Lingual Voice Cloning shared task, which aims to synthesize speech in a target language while preserving source-speaker identity. The system builds on FishAudio-S2-Pro, a multilingual TTS model, and introduces language tag prompting to reduce accent leakage, RL fine-tuning for intelligibility, and a reference-conditioned lexical matching method for domain-specific pronunciation. Language prompting yields the largest gains; lexical matching provides consistent improvements on matched subsets.

Multimodal Progress IWSLT 2026 Cross-Lingual Voice Cloning FishAudio-S2-Pro Karlsruhe Institute of Technology

5Hugging Face Blog·11d ago·source ↗

ServiceNow AI benchmarks frontier ASR systems on code-switched bilingual speech

ServiceNow AI published a benchmarking study evaluating frontier automatic speech recognition (ASR) systems on code-switched speech, where speakers alternate between two languages mid-conversation. The work targets a practical gap in voice agent deployments serving bilingual customer populations. Results assess how well current ASR models handle this linguistically complex scenario, with implications for enterprise voice AI reliability.

Evaluation and Benchmarking Enterprise Deployment Patterns ServiceNow AI

5arXiv · cs.AI·16d ago·source ↗

UniCAD: Unified benchmark and multimodal LLM for multi-task CAD learning

Researchers introduce UniCAD, a comprehensive benchmark for multi-modal CAD learning covering point-to-CAD reconstruction, text/image-to-CAD generation, and CAD question answering. Alongside the benchmark, they present UniCAD-MLLM, a single end-to-end multimodal large language model that ingests text, images, sketches, and point clouds to perform all these tasks. The system achieves state-of-the-art results on both UniCAD and Fusion360 benchmarks, outperforming task-specific and multi-task baselines. Dataset, code, and pretrained models are to be released.

Evaluation and Benchmarking Multimodal Progress Fusion360 UniCAD-MLLM UniCAD

4arXiv · cs.CL·19d ago·source ↗

Benchmarking Local LLMs for Confidential Translation Workflows

This paper evaluates locally runnable LLMs (via Ollama) for offline, privacy-constrained translation workflows targeting freelance translators and smaller language service providers. The authors expand their Reeve Foundation corpus to include German and Simplified Chinese, then benchmark local models across four language directions against commercial NMTs (DeepL, Baidu), a frontier LLM (GPT-5.2), and professional local NMT systems. Results show substantial performance variation by language direction and model size, with the best local LLMs matching or exceeding local NMT systems and the frontier LLM, though falling short of top commercial NMTs. The study supports the viability of local LLMs for confidentiality-sensitive translation use cases.

Evaluation and Benchmarking Open Weights Progress Ollama GPT-5.2 DeepL +8 more

5Qwen Research·1mo ago·source ↗

Qwen-MT Turbo: Alibaba Releases Specialized Translation Model Supporting 92 Languages

Alibaba's Qwen team has released qwen-mt-turbo, a specialized machine translation model built on Qwen3 and trained on trillions of multilingual and translation tokens. The model supports 92 languages and dialects covering over 95% of the global population. It incorporates reinforcement learning techniques to improve translation accuracy and linguistic fluency, and is available via the Qwen API.

Frontier Model Releases Multimodal Progress Alibaba Qwen API Qwen-MT +2 more

6Latent Space·1mo ago·source ↗

Thinking Machines' TML-Interaction-Small 276B-A12B Advances SOTA Realtime Voice and VAD

Thinking Machines has released TML-Interaction-Small, a 276B-A12B mixture-of-experts model targeting native interaction capabilities including realtime voice. The model is reported to advance state-of-the-art in realtime voice interaction and supersedes standard voice activity detection (VAD) approaches. The item is a brief AINews digest entry from Latent Space with minimal technical detail beyond the headline claims.

Frontier Model Releases Agent and Tool Ecosystem Thinking Machines TML-Interaction-Small Voice Activity Detection (VAD)+1 more

4Hugging Face Blog·1mo ago·source ↗

BenCzechMark: A Benchmark for Evaluating LLM Czech Language Understanding

BenCzechMark is a new evaluation benchmark designed to assess large language model performance on Czech language tasks. The benchmark addresses the gap in non-English language evaluation, providing a structured way to measure LLM capabilities in Czech across multiple task types. Published on Hugging Face, it contributes to the growing ecosystem of multilingual and language-specific benchmarks.

Evaluation and Benchmarking Hugging Face BenCzechMark