Entity · technique

warmdown learning-rate schedule

techniqueactivewarmdown-learning-rate-schedule-4dfac1d3·1 events·first seen May 26, 2026

Aliases: warmdown learning-rate schedule

Co-occurring entities

Quantisation-Aware Training (QAT)AdamW INT4 quantisation

More like this (12)

FORCE: Efficient VLA Reinforcement Fine-Tuning via Value-Calibrated Warm-up and Self-Distillation Alternating Token-Weighted Unlearning embedding layer learning rate progressive decay schedule Class-Incremental Learning temperature scaling temporally ordered pre-training Temporal Difference Learning inference-time behavioural unlearning quantization-aware training Asynchronous Noise Schedule test-time training

Recent events (1)

5arXiv · cs.CL·May 26, 2026·source ↗

Mapping the Schedule × Bit-Width Boundary in Sub-100M Quantisation-Aware Training

A large factorial grid study (1345 total runs across two phases) tests whether optimal learning-rate schedules differ by bit-width during from-scratch quantisation-aware training (QAT) for sub-100M decoder language models. The primary hypothesis—that INT6 QAT requires a different schedule than FP16/INT8—is falsified; a 33% warmdown fraction is optimal across all precisions and model sizes from 5M to 350M. For INT4, a regime boundary is identified near 50M parameters: above it, wd33 is decisively optimal; below it, schedule choice falls within seed-level noise. The study also establishes a log-linear scaling law for the INT6 quantisation penalty that successfully predicts held-out model sizes.

Training Infrastructure Open Weights Progress warmdown learning-rate schedule Quantisation-Aware Training (QAT)AdamW +2 more