5arXiv cs.CL (Computation and Language)·14h ago

Permutation-Invariant Fine-Tuning (PI-FT) eliminates field-order sensitivity in structured metadata retrieval

Researchers identify that fine-tuned text encoders for structured metadata retrieval silently overfit to field serialization order, losing 7.4 nDCG@10 points when field order changes at index time. They propose PI-FT, a two-line data-loader change that randomizes field order and applies random field dropout during fine-tuning, reducing the order-change penalty to 0.2 points. The paper also introduces DevDataBench, a fully LLM-generated multilingual benchmark covering ~10,000 development statistics indicators across 15 languages, and shows a fine-tuned 118M-parameter CPU encoder outperforms zero-shot text-embedding-3-large (0.707 vs. 0.556 nDCG@10) with strong gains in low-resource languages.

Evaluation and Benchmarking Enterprise Deployment Patterns PI-FT DevDataBench Field Order Should Not Matter: Permutation-Invariant Embedding Model Fine-Tuning for Structured Metadata Retrieval OpenAI text-embedding-3-large

Related guides (3)

OpenAI

OpenAI: The Lab That Made AI a Household Word

Read asBeginner In-depth

Evaluation and BenchmarkingTopic guide

Evaluation and Benchmarking: How We Measure AI — and Why It Keeps Getting Harder

Read asBeginner In-depth

Enterprise Deployment PatternsTopic guide

Enterprise Deployment Patterns: From AI Demo to Production Reality

Read asBeginner In-depth

Related events (8)

6Hugging Face Blog·1mo ago·source ↗

Parameter-Efficient Fine-Tuning using 🤗 PEFT

Hugging Face introduces the PEFT library, which enables parameter-efficient fine-tuning of large language models using techniques such as LoRA, prefix tuning, and prompt tuning. The library allows practitioners to adapt large pretrained models to downstream tasks while updating only a small fraction of model parameters, dramatically reducing compute and memory requirements. This lowers the barrier to fine-tuning frontier-scale models on consumer hardware.

Open Weights Progress Inference Economics PEFT LoRA Hugging Face +4 more

6arXiv · cs.LG·1mo ago·source ↗

PEFT-Arena: Benchmarking Parameter-Efficient Finetuning via Stability-Plasticity Trade-offs

PEFT-Arena is a new benchmark that evaluates parameter-efficient finetuning methods jointly on downstream task performance and retention of pretrained general capabilities, framing the problem as a stability-plasticity dilemma. Across methods tested under comparable parameter budgets, orthogonal finetuning achieves the best Pareto frontier. The paper provides geometric analyses in both weight space (spectral/singular-value structure) and activation space (representation distortion metrics) to explain why different PEFT methods differ in forgetting behavior. A practical finding is that final SFT checkpoints often overshoot an optimal retention operating point, motivating path-wise rewinding as a post-hoc correction.

Evaluation and Benchmarking Agent and Tool Ecosystem stability-plasticity dilemma stability-plasticity dilemma orthogonal finetuning +7 more

7The Batch·25d ago·source ↗

Fine-tuning LLMs on summary-expansion tasks strips copyright alignment guardrails, enabling up to 92% verbatim book reproduction

Researchers from Stony Brook University, Carnegie Mellon University, and Columbia Law School fine-tuned DeepSeek-V3.1, Gemini 2.5 Pro, and GPT-4o on a task of expanding plot summaries into prose paragraphs, finding that this caused models to regurgitate up to 91.9% of verbatim text from books in their pretraining data. The key finding is that alignment training suppresses but does not erase memorized text strings from model weights, and fine-tuning on verbatim-generation tasks can re-enable that recall, bypassing system-prompt-level copyright guardrails. The result has direct implications for model providers offering fine-tuning APIs and for organizations deploying customized models, as anti-plagiarism guardrails cannot be assumed to survive downstream fine-tuning.

AI Safety Research Regulatory Developments Carnegie Mellon University Xinyue Liu DeepSeek V4 +7 more

6arXiv · cs.CL·1mo ago·source ↗

Hyperfitting Explained: Terminal Geometric Expansion in Final Transformer Layers Drives Diversity Gains

This paper investigates the 'hyperfitting' phenomenon—where fine-tuning LLMs to near-zero loss on small datasets improves open-ended generation and reduces repetition—and demonstrates it is mechanistically distinct from temperature scaling. Entropy-matched control experiments falsify both the temperature-equivalence and static vocabulary reweighting hypotheses, instead localizing the effect to a 'Terminal Expansion' in the final transformer block where feature-space dimensionality expands by ~80.8 dimensions, enabling promotion of deep-tail tokens via context-dependent rank reordering. The authors introduce Late-Stage LoRA, a targeted fine-tuning strategy updating only the final 5 layers, achieving robust generation with minimal parameter updates.

Inference Economics Alignment and RLHF Terminal Expansion large language models temperature scaling +3 more

6arXiv · cs.CL·1mo ago·source ↗

ChunkFT: Memory-Efficient Full Fine-Tuning via Byte-Streamed Chunk Optimization

ChunkFT is a fine-tuning framework that reformulates full-parameter optimization around a dynamically activated working set of sub-tensors, enabling gradient computation without dense gradient materialization. It achieves full-parameter fine-tuning of a 7B model in 13.72GB GPU memory on a single RTX 4090, and scales Llama 3-70B fine-tuning to 2×H800 GPUs. Downstream evaluations on language understanding, math reasoning, and MT-Bench show ChunkFT matches or exceeds full-parameter fine-tuning quality while outperforming existing memory-efficient baselines such as LoRA-class methods. A theoretical convergence analysis in the deterministic setting is also provided.

Training Infrastructure Open Weights Progress Llama 3.1 70B MT-Bench Meta AI +5 more

7arXiv · cs.CL·28d ago·source ↗

On the Scaling of PEFT: Towards Million Personal Models of Trillion Parameters

This paper reframes parameter-efficient fine-tuning (PEFT) not merely as a cheaper alternative to full fine-tuning, but as a substrate for persistent, instance-specific personal models layered atop shared foundation models. The authors analyze three scaling axes: Scale Up (stronger base models amplifying adapter utility), Scale Down (minimum viable adapter size), and Scale Out (managing millions of concurrent adapted instances). They introduce MinT as an infrastructure reference for adapter identity, versioning, provenance, evaluation, and serving at scale.

Training Infrastructure Inference Economics LoRA Parameter-Efficient Fine-Tuning MinT +2 more

4arXiv · cs.AI·20h ago·source ↗

KnowsTFM: Knowledge graph-informed fine-tuning of small tabular foundation models

A new arXiv preprint introduces KnowsTFM, a method for fine-tuning small tabular foundation models (nanoscale TabPFN and TabICL variants) using structural attention priors derived from knowledge graphs and parameter-efficient low-rank updates. The approach targets niche domains with scarce, high-dimensional data shifted from pretraining distributions, showing meaningful gains in specialist settings but marginal gains on general tasks. The paper also reports that continual fine-tuning of frontier tabular models can trigger collapse of pretrained knowledge, a notable failure mode.

Evaluation and Benchmarking KnowsTFM TabPFN TabICL

6arXiv · cs.AI·1mo ago·source ↗

Demystifying Data Organization for Enhanced LLM Training

This Microsoft Research paper systematically investigates how data organization—distinct from data selection—affects LLM training efficiency across pre-training and SFT stages. The authors formalize four guidelines (Boundary Sharpening, Cyclic Scheduling, Curriculum Continuity, and Local Diversity) and introduce two novel data ordering methods, STR and SAW, that reuse pre-computed sample-level scores with minimal additional overhead. Experiments across multiple model scales and dataset sizes demonstrate improved training stability and performance, with code released publicly.

Training Infrastructure Alignment and RLHF Microsoft Cyclic Scheduling Local Diversity +4 more