Entity · paper

Efficient ASR Training with Conversations that Never Happened

paperactiveefficient-asr-training-with-conversations-that-never-happened-ba226706·1 events·first seen Jun 3, 2026

Aliases: Efficient ASR Training with Conversations that Never Happened

Co-occurring entities

More like this (12)

Learning to Hear Hesitation: Continual Learning for Disfluency-Aware ASR Leveraging Audio-LLMs to Filter Speech-to-Speech Training Data REDDIT: Correcting Model-Generated Timestamp Drift in ASR without Forgetting via Replay-Based Distribution Editing Context-Driven Incremental Compression for Multi-Turn Dialogue Generation Real-Time Voice AI Hears but Does Not Listen SpeechLLM Meets Federated Learning for End-to-End ASR: English and Italian Case Studies Artificial Analysis Conversational Dynamics Detecting Knowledge Gaps from Conversational AI Interactions Using Curriculum Prerequisite Graphs Reference-Augmented Training Interleaved Speech Language Models Latently Work In Text Audio-Native Speech Recognition with a Frozen Discrete-Diffusion Language Model Accuracy and Satisfaction in Multi-Turn LLM Dialogues for NFR Assessment

Recent events (1)

5arXiv · cs.CL·Jun 3, 2026·source ↗

Synthetic LLM-generated conversations improve ASR training for low-resource languages

Researchers propose a pipeline that uses LLMs to generate scenario-level dialogues and TTS to synthesize multi-speaker audio, creating simulated conversational training data for ASR systems. Evaluated on the Hungarian BEA-Dialogue benchmark, a model trained on 67 hours of real plus 636 hours of synthetic data outperforms a zero-shot model trained on 2,700 hours of real Hungarian speech. The study tests five LLM families under multiple budget and mixing configurations using a FastConformer-Large backbone, finding that generator choice and data composition significantly affect gains.

Evaluation and Benchmarking FastConformer-Large Efficient ASR Training with Conversations that Never Happened BEA-Dialogue