paper
The Energy Consumption of Transformer Fine-Tuning: A Roofline-Inspired Scaling Model
paperactiveprovisional
the-energy-consumption-of-transformer-fine-tuning-a-roofline-inspired-scaling-model-ebc08afa·1 events·first seen 34h agoAliases: The Energy Consumption of Transformer Fine-Tuning: A Roofline-Inspired Scaling Model
Co-occurring entities
More like this (12)
transformer architecturefeed-forward transformerEnergy-Based Transformers as Predictors of Reading DifficultyVariable-Width TransformersParameter-Efficient Fine-Tuningcausal transformertransformer-based neural rendererDynamic Short Convolutions Improve TransformersSparse-structure Multimodal Diffusion TransformerFine-Tuning for Financial UtilityGraph TransformerTransformer Language Models
Recent events (1)
Roofline-inspired scaling model predicts Transformer fine-tuning energy consumption across GPU configurations
A new arXiv preprint presents a framework for modeling energy consumption during Transformer training on multiple GPUs, using BERT architectural sweeps to relate measured energy to proxies for compute, memory traffic, and hardware efficiency. The approach adapts roofline modeling with a speedup-based hardware-efficiency factor that accounts for tensor parallelism and fully sharded data parallelism. The resulting scaling law accurately predicts training energy across heterogeneous configurations, targeting sustainable and cost-aware system design.