Almanac
← Events
5Hugging Face Blog·1mo ago

🤗 PEFT Welcomes New Merging Methods

Hugging Face's PEFT library has added new methods for merging parameter-efficient fine-tuned adapters (e.g., LoRA). The update enables combining multiple fine-tuned adapters into a single model, expanding the toolkit for practitioners working with adapter-based fine-tuning. This is a tooling update relevant to the growing ecosystem of efficient fine-tuning and model composition workflows.

Related guides (4)

Related events (8)

6Hugging Face Blog·1mo ago·source ↗

Parameter-Efficient Fine-Tuning using 🤗 PEFT

Hugging Face introduces the PEFT library, which enables parameter-efficient fine-tuning of large language models using techniques such as LoRA, prefix tuning, and prompt tuning. The library allows practitioners to adapt large pretrained models to downstream tasks while updating only a small fraction of model parameters, dramatically reducing compute and memory requirements. This lowers the barrier to fine-tuning frontier-scale models on consumer hardware.

5Hugging Face Blog·2d ago·source ↗

Hugging Face blog compares fine-tuning techniques beyond LoRA

A Hugging Face blog post examines whether alternative parameter-efficient fine-tuning (PEFT) methods can outperform LoRA, currently the dominant fine-tuning technique. The post likely benchmarks or analyzes competing approaches such as DoRA, IA3, or other PEFT variants against LoRA baselines. This is relevant for practitioners choosing fine-tuning strategies for LLMs.

4Hugging Face Blog·1mo ago·source ↗

Fine-Tuning Gemma Models in Hugging Face

Hugging Face published a guide on fine-tuning Google's Gemma models using parameter-efficient fine-tuning (PEFT) techniques. The post covers practical workflows for adapting Gemma to downstream tasks within the Hugging Face ecosystem. This represents part of the broader tooling support rollout accompanying Gemma's release in February 2024.

7arXiv · cs.CL·18d ago·source ↗

On the Scaling of PEFT: Towards Million Personal Models of Trillion Parameters

This paper reframes parameter-efficient fine-tuning (PEFT) not merely as a cheaper alternative to full fine-tuning, but as a substrate for persistent, instance-specific personal models layered atop shared foundation models. The authors analyze three scaling axes: Scale Up (stronger base models amplifying adapter utility), Scale Down (minimum viable adapter size), and Scale Out (managing millions of concurrent adapted instances). They introduce MinT as an infrastructure reference for adapter identity, versioning, provenance, evaluation, and serving at scale.

4Hugging Face Blog·1mo ago·source ↗

Fast LoRA inference for Flux with Diffusers and PEFT

Hugging Face published a technical blog post detailing optimizations for LoRA inference speed with the Flux image generation model using the Diffusers and PEFT libraries. The post covers techniques to accelerate adapter loading and inference throughput for diffusion models. This is relevant to practitioners deploying fine-tuned image generation models in production or research settings.

6arXiv · cs.LG·23d ago·source ↗

PEFT-Arena: Benchmarking Parameter-Efficient Finetuning via Stability-Plasticity Trade-offs

PEFT-Arena is a new benchmark that evaluates parameter-efficient finetuning methods jointly on downstream task performance and retention of pretrained general capabilities, framing the problem as a stability-plasticity dilemma. Across methods tested under comparable parameter budgets, orthogonal finetuning achieves the best Pareto frontier. The paper provides geometric analyses in both weight space (spectral/singular-value structure) and activation space (representation distortion metrics) to explain why different PEFT methods differ in forgetting behavior. A practical finding is that final SFT checkpoints often overshoot an optimal retention operating point, motivating path-wise rewinding as a post-hoc correction.

6Hugging Face Blog·1mo ago·source ↗

Fine-tuning 20B LLMs with RLHF on a 24GB consumer GPU

Hugging Face demonstrates a method for running RLHF fine-tuning on 20-billion-parameter language models using a single 24GB consumer GPU by combining TRL and PEFT (parameter-efficient fine-tuning). The approach uses techniques like LoRA and quantization to dramatically reduce memory requirements. This lowers the hardware barrier for RLHF experimentation from multi-GPU server setups to consumer-grade hardware.

4Hugging Face Blog·1mo ago·source ↗

Improving Hugging Face Training Efficiency Through Packing with Flash Attention 2

Hugging Face published a blog post describing a technique for improving training efficiency by packing multiple short sequences into a single batch using Flash Attention 2. The approach reduces padding waste and improves GPU utilization during LLM fine-tuning. This is a practical infrastructure optimization relevant to practitioners training models on datasets with variable-length sequences.