Almanac
paper

The Energy Consumption of Transformer Fine-Tuning: A Roofline-Inspired Scaling Model

paperactiveprovisionalthe-energy-consumption-of-transformer-fine-tuning-a-roofline-inspired-scaling-model-ebc08afa·1 events·first seen 34h ago

Aliases: The Energy Consumption of Transformer Fine-Tuning: A Roofline-Inspired Scaling Model

Co-occurring entities

More like this (12)

Recent events (1)

5arXiv · cs.CL·34h ago·source ↗

Roofline-inspired scaling model predicts Transformer fine-tuning energy consumption across GPU configurations

A new arXiv preprint presents a framework for modeling energy consumption during Transformer training on multiple GPUs, using BERT architectural sweeps to relate measured energy to proxies for compute, memory traffic, and hardware efficiency. The approach adapts roofline modeling with a speedup-based hardware-efficiency factor that accounts for tensor parallelism and fully sharded data parallelism. The resulting scaling law accurately predicts training energy across heterogeneous configurations, targeting sustainable and cost-aware system design.