Almanac
model

Llama 1B

modelactiveprovisionalllama-1b-a373f7aa·1 events·first seen 12d ago

Aliases: Llama 1B

Co-occurring entities

More like this (12)

Recent events (1)

4arXiv · cs.LG·12d ago·source ↗

PC Layer: Polynomial weight preconditioning for stable LLM pre-training

Researchers propose a PC (preconditioning) layer that applies polynomial preconditioning to reshape the singular-value spectrum of weight matrices during LLM training, improving conditioning stability. The preconditioned weights merge back into the original architecture at inference time with no overhead. Experiments on Llama-1B pre-training show advantages over standard transformers for both AdamW and Muon optimizers, with theoretical convergence guarantees for deep linear networks.