technique
warmdown learning-rate schedule
techniqueactiveprovisional
warmdown-learning-rate-schedule-4dfac1d3·1 events·first seen 22d agoAliases: warmdown learning-rate schedule
Co-occurring entities
More like this (12)
Alternating Token-Weighted Unlearningembedding layer learning rateprogressive decay scheduleClass-Incremental Learningtemperature scalingtemporally ordered pre-trainingTemporal Difference Learninginference-time behavioural unlearningquantization-aware trainingtest-time trainingconsistency trainingSoft Q-Learning
Recent events (1)
Mapping the Schedule × Bit-Width Boundary in Sub-100M Quantisation-Aware Training
A large factorial grid study (1345 total runs across two phases) tests whether optimal learning-rate schedules differ by bit-width during from-scratch quantisation-aware training (QAT) for sub-100M decoder language models. The primary hypothesis—that INT6 QAT requires a different schedule than FP16/INT8—is falsified; a 33% warmdown fraction is optimal across all precisions and model sizes from 5M to 350M. For INT4, a regime boundary is identified near 50M parameters: above it, wd33 is decisively optimal; below it, schedule choice falls within seed-level noise. The study also establishes a log-linear scaling law for the INT6 quantisation penalty that successfully predicts held-out model sizes.