Almanac
paper

PC Layer: Polynomial Weight Preconditioning for Improving LLM Pre-Training

paperactiveprovisionalpc-layer-polynomial-weight-preconditioning-for-improving-llm-pre-training-26751bd8·1 events·first seen 12d ago

Aliases: PC Layer: Polynomial Weight Preconditioning for Improving LLM Pre-Training

Co-occurring entities

More like this (12)

Recent events (1)

4arXiv · cs.LG·12d ago·source ↗

PC Layer: Polynomial weight preconditioning for stable LLM pre-training

Researchers propose a PC (preconditioning) layer that applies polynomial preconditioning to reshape the singular-value spectrum of weight matrices during LLM training, improving conditioning stability. The preconditioned weights merge back into the original architecture at inference time with no overhead. Experiments on Llama-1B pre-training show advantages over standard transformers for both AdamW and Muon optimizers, with theoretical convergence guarantees for deep linear networks.