Almanac
model

FastConformer-Large

modelactiveprovisionalfastconformer-large-6b450640·1 events·first seen 13d ago

Aliases: FastConformer-Large

Co-occurring entities

More like this (12)

Recent events (1)

5arXiv · cs.CL·13d ago·source ↗

Synthetic LLM-generated conversations improve ASR training for low-resource languages

Researchers propose a pipeline that uses LLMs to generate scenario-level dialogues and TTS to synthesize multi-speaker audio, creating simulated conversational training data for ASR systems. Evaluated on the Hungarian BEA-Dialogue benchmark, a model trained on 67 hours of real plus 636 hours of synthetic data outperforms a zero-shot model trained on 2,700 hours of real Hungarian speech. The study tests five LLM families under multiple budget and mixing configurations using a FastConformer-Large backbone, finding that generator choice and data composition significantly affect gains.