technique
Lightning Attention
techniqueactiveprovisional
lightning-attention-c6cf0029·1 events·first seen 7d agoAliases: Lightning Attention
Co-occurring entities
More like this (12)
Recent events (1)
Conservation laws from data symmetry in neural network gradient-flow training
A new arXiv preprint investigates whether intrinsic symmetries in training data produce conserved quantities during gradient-flow training of neural networks. The authors prove that for analytic, non-polynomial loss functions, data symmetries generically do not induce additional integrals of motion, but for MSE loss, data augmentation can yield extra conserved quantities. They introduce a framework of 'tensorizable networks'—architectures including linear, polynomial, and Lightning Attention networks—where parameter and input dependence can be separated via an intermediate representation.