Almanac
model

FishAudio-S2-Pro

modelactiveprovisionalfishaudio-s2-pro-706cc66f·1 events·first seen 9d ago

Aliases: FishAudio-S2-Pro

Co-occurring entities

More like this (12)

Recent events (1)

3arXiv · cs.CL·9d ago·source ↗

KIT submission to IWSLT 2026 cross-lingual voice cloning track with language tag prompting and RL fine-tuning

Researchers from KIT describe their system for the IWSLT 2026 Cross-Lingual Voice Cloning shared task, which aims to synthesize speech in a target language while preserving source-speaker identity. The system builds on FishAudio-S2-Pro, a multilingual TTS model, and introduces language tag prompting to reduce accent leakage, RL fine-tuning for intelligibility, and a reference-conditioned lexical matching method for domain-specific pronunciation. Language prompting yields the largest gains; lexical matching provides consistent improvements on matched subsets.