model
Llama 1B
modelactiveprovisional
llama-1b-a373f7aa·1 events·first seen 12d agoAliases: Llama 1B
Co-occurring entities
More like this (12)
Recent events (1)
PC Layer: Polynomial weight preconditioning for stable LLM pre-training
Researchers propose a PC (preconditioning) layer that applies polynomial preconditioning to reshape the singular-value spectrum of weight matrices during LLM training, improving conditioning stability. The preconditioned weights merge back into the original architecture at inference time with no overhead. Experiments on Llama-1B pre-training show advantages over standard transformers for both AdamW and Muon optimizers, with theoretical convergence guarantees for deep linear networks.