Almanac
technique

LoRA

techniqueactivelora-14760efb·31 events·first seen 1mo ago

Aliases: LoRA

Co-occurring entities

More like this (12)

Guides (1)

Recent events (31)

6Hugging Face Blog·1mo ago·source ↗

TGI Multi-LoRA: Deploy Once, Serve 30 Models

Hugging Face's Text Generation Inference (TGI) introduces Multi-LoRA serving, enabling a single base model deployment to serve up to 30 fine-tuned LoRA adapters simultaneously. This approach reduces infrastructure costs by eliminating the need to deploy separate model instances per fine-tune. The feature targets enterprise use cases where multiple task-specific variants of a base model are needed in production.

4Hugging Face Blog·1mo ago·source ↗

LoRA Training Scripts of the World, Unite!

Hugging Face published a blog post consolidating and comparing advanced LoRA fine-tuning scripts for Stable Diffusion XL, covering techniques such as pivotal tuning, custom captions, and various regularization strategies. The post aims to unify fragmented community training approaches into a more coherent set of best practices. It serves as a practical guide for practitioners fine-tuning SDXL models with LoRA adapters.

5Hugging Face Blog·1mo ago·source ↗

Goodbye cold boot - how we made LoRA Inference 300% faster

Hugging Face describes an optimization to their inference infrastructure that achieves a 300% speedup for LoRA adapter inference by enabling dynamic loading of adapters without cold boot penalties. The approach allows multiple LoRA adapters to be served efficiently from a single base model, reducing latency for adapter-based deployments. This is relevant to the growing ecosystem of fine-tuned model serving at scale.

5Hugging Face Blog·1mo ago·source ↗

Using LoRA for Efficient Stable Diffusion Fine-Tuning

This Hugging Face blog post explains how Low-Rank Adaptation (LoRA) can be applied to fine-tune Stable Diffusion models efficiently. LoRA reduces the number of trainable parameters by decomposing weight updates into low-rank matrices, enabling fine-tuning on consumer hardware with significantly less memory. The post covers practical implementation details using the diffusers library.

6arXiv · cs.CL·22d ago·source ↗

Parametric Memory Law for LoRA Finetuning: Quantifying LLM Memory Capacity

This paper introduces the Parametric Memory Law, a power-law relationship linking loss reduction to effective parameters and sequence length during LoRA-based LLM finetuning. The authors identify a phase transition at the token level where prediction probability p > 0.5 constitutes a sufficient condition for verbatim recall under greedy decoding. Building on these findings, they propose MemFT, a threshold-guided optimization strategy that dynamically reallocates training budget toward sub-threshold tokens, improving memory fidelity and efficiency.

5Hugging Face Blog·2d ago·source ↗

Hugging Face blog compares fine-tuning techniques beyond LoRA

A Hugging Face blog post examines whether alternative parameter-efficient fine-tuning (PEFT) methods can outperform LoRA, currently the dominant fine-tuning technique. The post likely benchmarks or analyzes competing approaches such as DoRA, IA3, or other PEFT variants against LoRA baselines. This is relevant for practitioners choosing fine-tuning strategies for LLMs.

4Hugging Face Blog·1mo ago·source ↗

Fast LoRA inference for Flux with Diffusers and PEFT

Hugging Face published a technical blog post detailing optimizations for LoRA inference speed with the Flux image generation model using the Diffusers and PEFT libraries. The post covers techniques to accelerate adapter loading and inference throughput for diffusion models. This is relevant to practitioners deploying fine-tuned image generation models in production or research settings.

5Hugging Face Blog·1mo ago·source ↗

(LoRA) Fine-Tuning FLUX.1-dev on Consumer Hardware

This Hugging Face blog post covers techniques for fine-tuning the FLUX.1-dev image generation model using LoRA (Low-Rank Adaptation) on consumer-grade hardware. The post likely addresses quantization strategies (QLoRA) to reduce memory requirements, enabling training on GPUs with limited VRAM. This is relevant to the open-weights and accessible fine-tuning ecosystem for diffusion models.

6Hugging Face Blog·1mo ago·source ↗

SDXL in 4 Steps with Latent Consistency LoRAs

Hugging Face demonstrates combining Latent Consistency Models (LCMs) with LoRA adapters to enable high-quality image generation with Stable Diffusion XL in as few as 4 inference steps. This approach dramatically reduces the number of diffusion steps required compared to standard SDXL, lowering inference latency and compute cost. The technique leverages consistency distillation applied via lightweight LoRA weights, making it accessible without full model retraining.

6Hugging Face Blog·1mo ago·source ↗

Fine-tuning 20B LLMs with RLHF on a 24GB consumer GPU

Hugging Face demonstrates a method for running RLHF fine-tuning on 20-billion-parameter language models using a single 24GB consumer GPU by combining TRL and PEFT (parameter-efficient fine-tuning). The approach uses techniques like LoRA and quantization to dramatically reduce memory requirements. This lowers the hardware barrier for RLHF experimentation from multi-GPU server setups to consumer-grade hardware.

6Hugging Face Blog·1mo ago·source ↗

Parameter-Efficient Fine-Tuning using 🤗 PEFT

Hugging Face introduces the PEFT library, which enables parameter-efficient fine-tuning of large language models using techniques such as LoRA, prefix tuning, and prompt tuning. The library allows practitioners to adapt large pretrained models to downstream tasks while updating only a small fraction of model parameters, dramatically reducing compute and memory requirements. This lowers the barrier to fine-tuning frontier-scale models on consumer hardware.

5arXiv · cs.CL·1mo ago·source ↗

SMoA: Spectrum Modulation Adapter for Parameter-Efficient Fine-Tuning

SMoA is a new parameter-efficient fine-tuning method that addresses LoRA's trade-off between rank size and parameter budget. It partitions model layers into spectral blocks and applies Hadamard-modulated low-rank branches to each diagonal block, enabling broader coverage of pretrained spectral directions without proportionally increasing trainable parameters. Theoretical analysis and empirical results on multiple tasks show SMoA outperforms LoRA and competitive LoRA-style baselines in lower-budget settings.

5arXiv · cs.LG·11d ago·source ↗

Local linear structures in LLM weights and activations are dynamic, not fixed global directions

A new arXiv paper investigates the nature of linear structures in transformer weights and activations, finding strong local low-rank task-gradient structure but rejecting the hypothesis that fixed task planes exist. The authors show that useful bases drift substantially within 100 optimization steps, yet early recovery updates form a trajectory-prefix basis capturing 77% of LoRA recovery displacement. They also establish a formal connection between parameter perturbations and activation steering, finding a 0.58 cosine similarity between gradient-step-induced activation shifts and CAA steering vectors, suggesting linear structures are evolving local geometries rather than stable global task directions.

5The Batch·1mo ago·source ↗

Sony and University Researchers Train Robots To Learn Without Catastrophic Forgetting

Researchers from UT Austin, UCLA, Nanyang Technological University, and Sony developed a sequential fine-tuning recipe combining LoRA and on-policy reinforcement learning (GRPO) to reduce catastrophic forgetting in vision-language-action (VLA) models for robotics. Applied to the OpenVLA-OFT model on the LIBERO benchmark, the method achieved 81.2% success on libero-spatial tasks with near-zero forgetting (0.3 percentage point drop), outperforming established continual learning baselines including Dark Experience Replay and Elastic Weight Consolidation. The approach requires no replay of prior task data and also showed modest generalization to unseen tasks. The authors note the method has not yet been tested outside robotics simulation contexts.

5Hugging Face Blog·1mo ago·source ↗

Fine-Tuning NVIDIA Cosmos Predict 2.5 with LoRA/DoRA for Robot Video Generation

This Hugging Face blog post details a workflow for fine-tuning NVIDIA's Cosmos Predict 2.5 world model using LoRA and DoRA parameter-efficient techniques for robot video generation tasks. The post covers practical implementation steps for adapting the foundation video model to robotics-specific domains. This represents a concrete application of world models to embodied AI, where synthetic video generation can support robot training data pipelines.

5Hugging Face Blog·1mo ago·source ↗

🤗 PEFT Welcomes New Merging Methods

Hugging Face's PEFT library has added new methods for merging parameter-efficient fine-tuned adapters (e.g., LoRA). The update enables combining multiple fine-tuned adapters into a single model, expanding the toolkit for practitioners working with adapter-based fine-tuning. This is a tooling update relevant to the growing ecosystem of efficient fine-tuning and model composition workflows.

3Hugging Face Blog·1mo ago·source ↗

Comparing RoBERTa, Llama 2, and Mistral for Sequence Classification via LoRA on Disaster Tweets

A Hugging Face blog post benchmarks three models—RoBERTa, Llama 2, and Mistral—on a disaster tweet classification task using LoRA fine-tuning. The analysis compares parameter-efficient adaptation of encoder-only versus decoder-only architectures for a practical NLP classification problem. Results provide practitioners with guidance on model selection and LoRA configuration for sequence classification.

6arXiv · cs.CL·29d ago·source ↗

Hyperfitting Explained: Terminal Geometric Expansion in Final Transformer Layers Drives Diversity Gains

This paper investigates the 'hyperfitting' phenomenon—where fine-tuning LLMs to near-zero loss on small datasets improves open-ended generation and reduces repetition—and demonstrates it is mechanistically distinct from temperature scaling. Entropy-matched control experiments falsify both the temperature-equivalence and static vocabulary reweighting hypotheses, instead localizing the effect to a 'Terminal Expansion' in the final transformer block where feature-space dimensionality expands by ~80.8 dimensions, enabling promotion of deep-tail tokens via context-dependent rank reordering. The authors introduce Late-Stage LoRA, a targeted fine-tuning strategy updating only the final 5 layers, achieving robust generation with minimal parameter updates.

3arXiv · cs.CL·11d ago·source ↗

Synthetic data bootstrapping and LoRA fine-tuning for Q'eqchi' Mayan NMT without web scraping

Researchers introduce a data synthesis methodology for low-resource neural machine translation of Q'eqchi' Mayan, converting community-sourced dictionaries into a synthetic parallel corpus to avoid scraping target-language data. Using LoRA adapters on mT5-base, the approach achieves BLEU 42.02 on in-domain evaluation but only 0.59 against organic text, revealing a structural-semantic gap. An ablation with multi-task learning produced negative transfer, suggesting LoRA capacity limits conflict with auxiliary objectives. The study concludes synthetic bootstrapping is effective for structural priming but requires authentic data for semantic refinement via curriculum learning.

5arXiv · cs.CL·10d ago·source ↗

AuRA: Distilling audio understanding into LLMs via LoRA adaptation

AuRA is a new method for integrating speech understanding into LLMs by distilling audio encoding capability directly into LoRA-adapted model weights, bypassing cascaded ASR-LLM pipelines. A lightweight audio embedding layer feeds speech to both an ASR encoder (teacher) and a LoRA-adapted LLM (student), with layer-wise distillation aligning hidden states. The approach claims to outperform cascaded systems, bridge-based adaptation baselines, and large-scale multimodal models on multiple speech-language benchmarks while enabling parallel end-to-end inference without large-scale multimodal training.

4Hugging Face Blog·1mo ago·source ↗

Personal Copilot: Train Your Own Coding Assistant

This Hugging Face blog post walks through fine-tuning an open-weights code model to create a personalized coding assistant. It covers dataset preparation, training techniques (likely LoRA/PEFT), and deployment considerations for self-hosted code completion. The post targets practitioners who want a GitHub Copilot-like experience without relying on proprietary APIs.

6Mistral Ai News·19d ago·source ↗

Mistral AI Launches Model Customization Suite: Open-Source SDK, Managed Fine-Tuning, and Custom Training

Mistral AI has introduced three tiers of model customization on la Plateforme: an open-source LoRA-based fine-tuning SDK (mistral-finetune) for self-hosted use, serverless managed fine-tuning services via API initially supporting Mistral 7B and Mistral Small, and bespoke custom training services including continuous pretraining for enterprise customers. The managed fine-tuning uses LoRA adapters and claims cost and efficiency advantages over full fine-tuning while maintaining comparable performance. This positions Mistral as a full-stack customization provider competing with OpenAI's fine-tuning API and similar offerings.

5arXiv · cs.AI·15d ago·source ↗

Code2LoRA: Hypernetwork generates repository-specific LoRA adapters for code models with zero token overhead

Code2LoRA is a hypernetwork framework that generates repository-specific LoRA adapters for code language models, eliminating the inference-time token overhead of RAG or long-context injection. It supports both static repository snapshots and evolving codebases via a GRU-backed adapter updated per code diff. The authors introduce RepoPeftBench, a new benchmark of 604 Python repositories with static and evolution tracks, on which Code2LoRA-Static matches per-repository LoRA fine-tuning upper bounds and Code2LoRA-Evo outperforms a shared LoRA by 5.2 percentage points.

6Hugging Face Blog·1mo ago·source ↗

GaLore: Advancing Large Model Training on Consumer-grade Hardware

GaLore (Gradient Low-Rank Projection) is a memory-efficient training technique that reduces optimizer state memory by projecting gradients into a low-rank subspace during training, enabling large model training on consumer-grade hardware. The Hugging Face blog post covers integration of GaLore into the transformers and peft ecosystems. Unlike LoRA, GaLore applies low-rank projection to the full training process rather than constraining weight updates, allowing full-parameter learning with reduced memory footprint. This makes training models like LLaMA-7B feasible on single consumer GPUs.

5arXiv · cs.LG·18d ago·source ↗

ProtoAda: Prototype-Guided Adaptive Adapter Expansion for Multimodal Continual Instruction Tuning

ProtoAda is a new framework for Multimodal Continual Instruction Tuning (MCIT) that addresses a key failure mode in sparse Mixture-of-LoRA-Experts architectures: image-text similarity routing is format-blind and incorrectly merges tasks with similar semantics but different output structures (e.g., coordinate prediction vs. VQA). The method introduces format-aware task prototypes to guide both routing and adapter expansion, then consolidates compatible updates geometrically to reuse and refine existing parameters. Experiments across multiple benchmarks show improved performance, particularly on tasks whose answer formats are vulnerable to corruption by sequential fine-tuning.

7arXiv · cs.CL·18d ago·source ↗

On the Scaling of PEFT: Towards Million Personal Models of Trillion Parameters

This paper reframes parameter-efficient fine-tuning (PEFT) not merely as a cheaper alternative to full fine-tuning, but as a substrate for persistent, instance-specific personal models layered atop shared foundation models. The authors analyze three scaling axes: Scale Up (stronger base models amplifying adapter utility), Scale Down (minimum viable adapter size), and Scale Out (managing millions of concurrent adapted instances). They introduce MinT as an infrastructure reference for adapter identity, versioning, provenance, evaluation, and serving at scale.

5arXiv · cs.CL·9d ago·source ↗

Doc-to-Atom: Compositional parametric memory via semantically typed micro-LoRA adapters

Doc-to-Atom (Doc2Atom) proposes a framework that decomposes documents into semantically typed knowledge atoms, each compiled into an independent micro-LoRA adapter with a retrieval key. At inference, a lightweight query router assembles only relevant atoms into a query-specific adapter injected into a frozen base model, addressing the irrelevant-query interference and scalability problems of monolithic adapter approaches like Doc-to-LoRA. The system is trained end-to-end via multi-objective distillation and outperforms Doc-to-LoRA baselines on six QA benchmarks while reducing memory cost.

7arXiv · cs.CL·22d ago·source ↗

Reinforcement Learning Recruits a Pre-Existing 'Functional Welfare' Axis in Language Models

Researchers trained language models in a semantically neutral maze environment and extracted concept vectors for rewarded and punished trajectories, finding that RL recruits a pre-existing representational axis encoding functional welfare—how well or badly the system is doing relative to its goals. The punishment vector promotes failure tokens, aligns with negative emotion concepts, and induces refusal and uncertainty when used for steering; the reward vector is its near-antiparallel mirror. Critically, these vectors are effective in models before maze training and appear in pretrain-only models, suggesting the welfare axis pre-exists post-training rather than being created by it. The findings have implications for interpretability, alignment, and understanding how minimal reward signals can broadly reshape model behavior.

5arXiv · cs.LG·3d ago·source ↗

Multi-source cybersecurity log dataset with ATT&CK labels and SLM fine-tuning evaluation

Researchers introduce a new multi-source cybersecurity log dataset of 870 sessions (~2.3M events) capturing system, network, and browser activity on Windows endpoints, with per-entry MITRE ATT&CK technique labels across 12 tactics and 53 techniques. The dataset addresses gaps in existing public datasets (CICIDS, UNSW-NB15, ATLAS) that lack combined multi-source coverage with fine-grained ATT&CK labeling. Three small language models (Qwen2.5-1.5B, Llama-3.2-3B, Phi-4-Mini) were fine-tuned with LoRA on the dataset, achieving chunk classification accuracy of 90–97% versus ~8% for base variants, though ATT&CK technique identification remained harder at 42% exact-match accuracy.

5Github Trending·2d ago·source ↗

Lightricks releases LTX-2 official Python inference and LoRA trainer package for audio-video generation

Lightricks has published the official Python package for LTX-2, an audio-video generative model, including both inference and LoRA fine-tuning capabilities. The repository has accumulated 7,474 stars with ongoing community traction. This represents a notable open-source multimodal generative model release combining audio and video synthesis.

6arXiv · cs.AI·45h ago·source ↗

CWE-Trace framework reveals LLM vulnerability detection is calibration without comprehension

Researchers introduce CWE-Trace, a benchmark of 834 manually curated Linux kernel samples across 74 CWEs with strict temporal splits to prevent data contamination, used to evaluate 8 vanilla LLMs and 15 LoRA fine-tuned variants on vulnerability detection. Key findings: data contamination provides no measurable advantage (84% of nominally contaminated samples carry no usable memorization signal), and backbone directional priors dominate fine-tuning — models exhibit stable systematic failure modes that resist correction. The best binary detection score reaches only 52.1% (barely above chance) and exact CWE classification Top-1 accuracy stays below 1.3%, indicating fine-tuning shifts output distributions without instilling genuine security reasoning. The work introduces two diagnostic metrics (Directional Failure Index and Hierarchical Distance and Direction) and concludes that detection capability and security understanding are decoupled in current LLMs.