Neural network-based task scheduling with preemptive fan control
Bilge Acun, E. K. Lee, et al.
E2SC 2016
System administrators can employ various diagnostic tests to identify failures in high performance computing systems, but manual analysis of the results can be time-consuming. Moreover, the execution of these tests can occupy system resources and individual diagnostic results only represent the instantaneous state of the system. In this paper, we propose the use of a directional relation graph to summarize and visualize diagnostic results over time. The graph is a visual representation of the frequency of different test failures and relations among failures in a specific time range. We demonstrate the directional relation graph using diagnostic results obtained during the execution of synthetic anomalies. Furthermore, we discuss how graph analysis of relations among failures can narrow the suite of tests to reduce overall test time.
Bilge Acun, E. K. Lee, et al.
E2SC 2016
Miloš Puzović, E. K. Lee, et al.
SFI
Bilge Acun, Alper Buyuktosunoglu, et al.
HPCA 2019
Jingoo Han, Luna Xu, et al.
CLUSTER 2019