Characterizing Training Performance and Energy for Foundation Models and Image Classifiers on Multi-Instance GPUs
Abstract
GPUs are becoming a scarce resource in high demand, as many teams build and train increasingly advanced artificial intelligence workloads. As GPUs become more performant, they consume more energy, with NVIDIA’s latest A100 and H100 graphics cards consuming upwards of 700W of power. This paper characterizes how best to scale down a large modern GPU to suite workloads that cannot fully exploit an entire GPU. The paper measures six workloads from 14 million parameter image classifiers to 1.5 billion parameter large language models, finding that partitioned GPUs with a mix of small, medium, and large partitions can deliver up to 33% less energy demand and 9% higher training throughput from a single GPU. We found high potential in fine-tuning existing models, with 55% faster training at 42% less energy. Our results suggest that multiplexing small workloads onto spatially partitioned GPUs can improve the efficiency of a single GPU while giving clients access to smaller slices of the latest GPUs that better suits their job’s demands.