paper
PC Layer: Polynomial Weight Preconditioning for Improving LLM Pre-Training
paperactiveprovisional
pc-layer-polynomial-weight-preconditioning-for-improving-llm-pre-training-26751bd8·1 events·first seen 12d agoAliases: PC Layer: Polynomial Weight Preconditioning for Improving LLM Pre-Training
Co-occurring entities
More like this (12)
Continual LLM Upcycling: A Predictor-Gated Bank-Wise Sparsity Training Recipe for Dense-to-Sparse LLMsLLM PretrainingTailLoR: Protecting Principal Components in Parameter-Efficient Continual LearningCLP: Collocation-Length Prediction for Zero-Loss Adaptive Multi-Token InferenceLeveraging Audio-LLMs to Filter Speech-to-Speech Training DataAlternating Token-Weighted UnlearningTraining LLMs to Enforce Multi-Level Instruction Hierarchies via Gravity-Weighted Direct Preference Optimizationq0: Primitives for Hyper-Epoch PretrainingKL-Cov regularizationLayer-Adaptive Expert PruningA sleep-like consolidation mechanism for LLMsOn The Effectiveness-Fluency Trade-Off In LLM Conditioning: A Systematic Study
Recent events (1)
PC Layer: Polynomial weight preconditioning for stable LLM pre-training
Researchers propose a PC (preconditioning) layer that applies polynomial preconditioning to reshape the singular-value spectrum of weight matrices during LLM training, improving conditioning stability. The preconditioned weights merge back into the original architecture at inference time with no overhead. Experiments on Llama-1B pre-training show advantages over standard transformers for both AdamW and Muon optimizers, with theoretical convergence guarantees for deep linear networks.