paper

The Energy Consumption of Transformer Fine-Tuning: A Roofline-Inspired Scaling Model

paperactiveprovisionalthe-energy-consumption-of-transformer-fine-tuning-a-roofline-inspired-scaling-model-ebc08afa·1 events·first seen 34h ago

Aliases: The Energy Consumption of Transformer Fine-Tuning: A Roofline-Inspired Scaling Model

Co-occurring entities

BERT

More like this (12)

transformer architecture feed-forward transformer Energy-Based Transformers as Predictors of Reading Difficulty Variable-Width Transformers Parameter-Efficient Fine-Tuning causal transformer transformer-based neural renderer Dynamic Short Convolutions Improve Transformers Sparse-structure Multimodal Diffusion Transformer Fine-Tuning for Financial Utility Graph Transformer Transformer Language Models

Recent events (1)

5arXiv · cs.CL·34h ago·source ↗

Roofline-inspired scaling model predicts Transformer fine-tuning energy consumption across GPU configurations

A new arXiv preprint presents a framework for modeling energy consumption during Transformer training on multiple GPUs, using BERT architectural sweeps to relate measured energy to proxies for compute, memory traffic, and hardware efficiency. The approach adapts roofline modeling with a speedup-based hardware-efficiency factor that accounts for tensor parallelism and fully sharded data parallelism. The resulting scaling law accurately predicts training energy across heterogeneous configurations, targeting sustainable and cost-aware system design.

Training Infrastructure Inference Economics The Energy Consumption of Transformer Fine-Tuning: A Roofline-Inspired Scaling Model BERT