technique
large-batch training
techniqueactive
large-batch-training-589a98cb·1 events·first seen 28d agoAliases: large-batch training
Co-occurring entities
More like this (12)
Recent events (1)
How AI Training Scales: Gradient Noise Scale Predicts Batch Parallelizability
OpenAI researchers report that the gradient noise scale — a statistical metric measuring gradient variance relative to mean — reliably predicts the optimal batch size and degree of parallelizability across a wide range of neural network training tasks. The finding suggests that more complex tasks with noisier gradients can benefit from increasingly large batch sizes, removing a potential ceiling on scaling. The work frames training dynamics as a systematic, measurable process rather than empirical art.