4Hugging Face Blog·1mo ago

Fine-Tune MMS Adapter Models for Low-Resource ASR

This Hugging Face blog post provides a technical guide for fine-tuning Meta's Massively Multilingual Speech (MMS) adapter models for automatic speech recognition in low-resource languages. It covers the adapter-based fine-tuning approach that allows efficient adaptation of the MMS model to specific languages without full model retraining. The post targets practitioners working on speech recognition for underrepresented languages.

Open Weights Progress Agent and Tool Ecosystem MMS (Massively Multilingual Speech)Meta AI adapter fine-tuning Hugging Face

Related guides (3)

Hugging Face

Hugging Face: The Home of Open-Source AI

Read asBeginner In-depth

Open Weights ProgressTopic guide

Open Weights Progress: How Freely Available AI Models Caught Up to the Frontier

Read asBeginner In-depth

Agent and Tool EcosystemTopic guide

Agent and Tool Ecosystem: How AI Is Learning to Act, Not Just Answer

Read asBeginner In-depth

Related events (8)

3Hugging Face Blog·1mo ago·source ↗

Fine-Tune W2V2-Bert for Low-Resource ASR with Hugging Face Transformers

Hugging Face published a tutorial on fine-tuning the W2V2-Bert model for automatic speech recognition in low-resource language settings using the Transformers library. The post covers practical steps for adapting the wav2vec2-BERT architecture to languages with limited training data. This is a practitioner-oriented guide targeting the open-source ML community.

Open Weights Progress wav2vec2-BERT Hugging Face Transformers Hugging Face

4Hugging Face Blog·1mo ago·source ↗

Fine-Tune Whisper For Multilingual ASR with 🤗 Transformers

This Hugging Face blog post provides a practical guide for fine-tuning OpenAI's Whisper model for multilingual automatic speech recognition using the Transformers library. It covers dataset preparation, training configuration, and evaluation using the Word Error Rate metric. The post targets practitioners seeking to adapt Whisper to low-resource or domain-specific languages.

Open Weights Progress Agent and Tool Ecosystem Hugging Face Transformers Hugging Face Word Error Rate +2 more

4arXiv · cs.CL·11d ago·source ↗

Multilingual word-level forced alignment using MMS and learned dynamic programming outperforms MFA

Researchers present a forced alignment system combining Meta's Massively Multilingual Speech (MMS) model with a self-supervised phoneme boundary detector (UnSupSeg) and a learned dynamic programming decoder. Trained on TIMIT and Buckeye, the system outperforms Montreal Forced Aligner and MMS-based alignment on both datasets and generalizes to unseen languages (Dutch, German, Hebrew) without additional training. The approach claims potential to scale to 1100+ languages supported by MMS, making it relevant for low-resource speech processing pipelines.

Multimodal Progress MMS (Massively Multilingual Speech)Montreal Forced Aligner Buckeye +2 more

4Hugging Face Blog·1mo ago·source ↗

Investing in Performance: Fine-tune small models with LLM insights — a CFM case study

This Hugging Face blog post presents a case study from CFM (Capital Fund Management) on using large language model outputs to guide fine-tuning of smaller, more efficient models for financial applications. The approach leverages LLM-generated signals or labels to train compact models that can be deployed at lower cost and latency. The case study illustrates an enterprise pattern of distilling LLM capabilities into task-specific smaller models for production use.

Inference Economics Enterprise Deployment Patterns knowledge distillation Hugging Face Capital Fund Management +1 more

5Hugging Face Blog·1mo ago·source ↗

Fine-tuning Florence-2 - Microsoft's Cutting-edge Vision Language Models

This Hugging Face blog post provides a technical guide for fine-tuning Microsoft's Florence-2 vision-language models. Florence-2 is a compact yet capable multimodal model supporting tasks like captioning, object detection, and OCR. The post covers practical implementation details for adapting the model to custom datasets using the Hugging Face ecosystem.

Enterprise Deployment Patterns Agent and Tool Ecosystem Microsoft Hugging Face Florence-2 +1 more

6Hugging Face Blog·1mo ago·source ↗

Parameter-Efficient Fine-Tuning using 🤗 PEFT

Hugging Face introduces the PEFT library, which enables parameter-efficient fine-tuning of large language models using techniques such as LoRA, prefix tuning, and prompt tuning. The library allows practitioners to adapt large pretrained models to downstream tasks while updating only a small fraction of model parameters, dramatically reducing compute and memory requirements. This lowers the barrier to fine-tuning frontier-scale models on consumer hardware.

Open Weights Progress Inference Economics PEFT LoRA Hugging Face +4 more

5arXiv · cs.LG·18d ago·source ↗

ProtoAda: Prototype-Guided Adaptive Adapter Expansion for Multimodal Continual Instruction Tuning

ProtoAda is a new framework for Multimodal Continual Instruction Tuning (MCIT) that addresses a key failure mode in sparse Mixture-of-LoRA-Experts architectures: image-text similarity routing is format-blind and incorrectly merges tasks with similar semantics but different output structures (e.g., coordinate prediction vs. VQA). The method introduces format-aware task prototypes to guide both routing and adapter expansion, then consolidates compatible updates geometrically to reuse and refine existing parameters. Experiments across multiple benchmarks show improved performance, particularly on tasks whose answer formats are vulnerable to corruption by sequential fine-tuning.

Agent and Tool Ecosystem Alignment and RLHF Multimodal Large Language Models ProtoAda LoRA +4 more

5arXiv · cs.CL·11d ago·source ↗

ADAS: Attention-Discounted Adaptive Sampler improves parallel decoding for masked diffusion language models

Researchers propose ADAS, a training-free reranking rule for masked diffusion language model decoding that addresses token interaction failures in parallel token commitment. The method greedily penalizes candidates that attend strongly to already-selected uncertain positions, using attention weights as soft marginal penalties rather than hard constraints. Evaluated on LLaDA-8B-Base and Dream-7B-Base across GSM8K, MATH500, HumanEval, and MBPP, ADAS improves low-NFE performance by 9–10 percentage points on average when plugged into existing samplers with only 3.1% runtime overhead.

Frontier Model Releases Inference Economics LLaDA-8B-Base MATH500 EB-Sampler +6 more