Almanac
technique

large-batch training

techniqueactivelarge-batch-training-589a98cb·1 events·first seen 28d ago

Aliases: large-batch training

Co-occurring entities

More like this (12)

Recent events (1)

6Openai Blog·28d ago·source ↗

How AI Training Scales: Gradient Noise Scale Predicts Batch Parallelizability

OpenAI researchers report that the gradient noise scale — a statistical metric measuring gradient variance relative to mean — reliably predicts the optimal batch size and degree of parallelizability across a wide range of neural network training tasks. The finding suggests that more complex tasks with noisier gradients can benefit from increasingly large batch sizes, removing a potential ceiling on scaling. The work frames training dynamics as a systematic, measurable process rather than empirical art.