Talk

A Practical Guide To Benchmarking AI and GPU Workloads in Kubernetes

Abstract

Effective benchmarking is required to optimize GPU resource efficiency and enhance performance for AI workloads. This talk provides a practical guide on setting up, configuring, and running various GPU and AI workload benchmarks in Kubernetes.

The talk covers benchmarks for a range of use cases, including model serving, model training and GPU stress testing, using tools like NVIDIA Triton Inference Server, fmperf: an open-source tool for benchmarking LLM serving performance, MLPerf: an open benchmark suite to compare the performance of machine learning systems, GPUStressTest, gpu-burn, and cuda benchmark. The talk will also introduce GPU monitoring and load generation tools.

Through step-by-step demonstrations, attendees will gain practical experience using benchmark tools. They will learn how to effectively run benchmarks on GPUs in Kubernetes and leverage existing tools to fine-tune and optimize GPU resource and workload management for improved performance and resource efficiency.

Related