Dorothea Wiesmann, Roland Germann, et al.
Journal of the Optical Society of America B: Optical Physics
The Predictive Analytics for Server Incident Reduction (PASIR) solution developed at IBM has been broadly deployed to 130 IT environments since the beginning of 2014. The infrastructures of these IT environments, pertaining to various industries around the world, are serviced by IBM support groups. More specifically, incidents occurring on servers, including the descriptions of the problems, are reported into a ticket management system. These tickets are then resolved by the assigned support teams, which record in the system the resolution steps taken. PASIR, first classifies the incident tickets of an IT environment to identify high-impact incidents describing server unavailability and performance degradation issues by using ticket descriptions and resolutions. Second, the occurrence of these high-impact tickets is correlated with server properties and utilization measures to identify troubled server configurations and prescribe improvement actions through multivariate analysis. In this paper, we present the findings from deploying our two-step machine learning model in the field. In particular, we describe the PASIR methodology, from ticket classification to the recommendation of modernization actions. We also assess the process of manual ticket labeling and the impact of noisy input data on our automatic classifier, and we demonstrate the model effectiveness by comparing predictions on the impact of prescriptive actions with actual system improvements.
Dorothea Wiesmann, Roland Germann, et al.
Journal of the Optical Society of America B: Optical Physics
Ioana Giurgiu, Jacint Szabo, et al.
Middleware 2017
Ioana Giurgiu, Dorothea Wiesmann, et al.
SYSTOR 2017
Ioana Giurgiu, Jasmina Bogojeska, et al.
CCGrid 2014