Jinho Hwang, Larisa Shwartz, et al.
ICSE-SEIP 2021
This paper presents a trace-driven experimentation and analytics framework that allows researchers and engineers to devise and evaluate operational strategies for large-scale AI workflow systems. Analytics data from a production-grade AI platform developed at IBM are used to build a comprehensive system and simulation model. Synthetic traces are made available for ad-hoc exploration as well as statistical analysis of experiments to test and examine pipeline scheduling, cluster resource allocation, and similar operational mechanisms.
Jinho Hwang, Larisa Shwartz, et al.
ICSE-SEIP 2021
Praveen Venkateswaran, Vatche Isahagian, et al.
CLOUD 2023
Thomas Rausch, Waldemar Hummer, et al.
HotEdge/USENIX ATC 2019
Jiaqin Yuan, Michele Merler, et al.
ACL 2023