paper
Leveraging Audio-LLMs to Filter Speech-to-Speech Training Data
paperactiveprovisional
leveraging-audio-llms-to-filter-speech-to-speech-training-data-6d3cdad5·1 events·first seen 5d agoAliases: Leveraging Audio-LLMs to Filter Speech-to-Speech Training Data
Co-occurring entities
More like this (12)
LLM-augmented clinical NLP pipelineCross-Modal Masking for Robust Silent Speech Synthesis Using sEMG and LipreadingAudio-LLMContinual LLM Upcycling: A Predictor-Gated Bank-Wise Sparsity Training Recipe for Dense-to-Sparse LLMsSpeaker Group Encoding in Self-supervised Speech Recognition ModelsMulti-Faceted Interactivity Alignment in Full-Duplex Speech ModelsEfficient ASR Training with Conversations that Never HappenedMLSkip: Data Skipping for ML Filters via Lightweight MetadataFrom Self-Supervised Speech Models to Mixture-of-Experts for Robust Anti-Spoofinghuman-LLM collaborative annotationLearning to Hear Hesitation: Continual Learning for Disfluency-Aware ASRListening with Attention: Entropy-Guided Explainability for Transformer-Based Audio Models
Recent events (1)
Audio-LLM-based data filtering for speech-to-speech translation via Rank-to-Distill
A new arXiv paper proposes using audio large language models to filter noisy training data for end-to-end speech-to-speech translation (S2ST). The authors introduce a two-stage Rank-to-Distill strategy: a lightweight ranker generates pseudo-labels from noisy speech pairs, which then supervise an audio-LLM to make keep/drop decisions directly from raw audio. Experiments on CVSS-C and SpeechMatrix benchmarks show up to +1.4 ASR-BLEU improvement over unfiltered baselines.