Entity · paper

Leveraging Audio-LLMs to Filter Speech-to-Speech Training Data

paperactiveleveraging-audio-llms-to-filter-speech-to-speech-training-data-6d3cdad5·1 events·first seen Jun 12, 2026

Aliases: Leveraging Audio-LLMs to Filter Speech-to-Speech Training Data

Co-occurring entities

More like this (12)

SpeechLLM Meets Federated Learning for End-to-End ASR: English and Italian Case Studies Interleaved Speech Language Models Latently Work In Text LLM-augmented clinical NLP pipeline In-Place Tokenizer Expansion for Pre-trained LLMs Cross-Modal Masking for Robust Silent Speech Synthesis Using sEMG and Lipreading Data Selection Through Iterative Self-Filtering for Vision-Language Settings Audio-LLM Continual LLM Upcycling: A Predictor-Gated Bank-Wise Sparsity Training Recipe for Dense-to-Sparse LLMs Audio-Native Speech Recognition with a Frozen Discrete-Diffusion Language Model Speaker Group Encoding in Self-supervised Speech Recognition Models Multi-Faceted Interactivity Alignment in Full-Duplex Speech Models Efficient ASR Training with Conversations that Never Happened

Recent events (1)

4arXiv · cs.CL·Jun 12, 2026·source ↗

Audio-LLM-based data filtering for speech-to-speech translation via Rank-to-Distill

A new arXiv paper proposes using audio large language models to filter noisy training data for end-to-end speech-to-speech translation (S2ST). The authors introduce a two-stage Rank-to-Distill strategy: a lightweight ranker generates pseudo-labels from noisy speech pairs, which then supervise an audio-LLM to make keep/drop decisions directly from raw audio. Experiments on CVSS-C and SpeechMatrix benchmarks show up to +1.4 ASR-BLEU improvement over unfiltered baselines.

Evaluation and Benchmarking Multimodal Progress Leveraging Audio-LLMs to Filter Speech-to-Speech Training Data SpeechMatrix CVSS-C +1 more