An Empirical Study of the Impact of Data Splitting Decisions on the Performance of AIOps Solutions-Reference-Cited by-同舟云学术

An Empirical Study of the Impact of Data Splitting Decisions on the Performance of AIOps Solutions

Published:2021-07 Issue:4 Volume:30 Page:1-38
ISSN:1049-331X
Container-title:ACM Transactions on Software Engineering and Methodology
language:en
Short-container-title:ACM Trans. Softw. Eng. Methodol.

Author:

Lyu Yingzhe¹,Li Heng²,Sayagh Mohammed³,Jiang Zhen Ming (Jack)⁴,Hassan Ahmed E.¹

Affiliation:

1. Queen’s University, Canada

2. Polytechnique Montreal, Canada

3. ETS - Quebec University, Canada

4. York University, Canada

Abstract

AIOps (Artificial Intelligence for IT Operations) leverages machine learning models to help practitioners handle the massive data produced during the operations of large-scale systems. However, due to the nature of the operation data, AIOps modeling faces several data splitting-related challenges, such as imbalanced data, data leakage, and concept drift. In this work, we study the data leakage and concept drift challenges in the context of AIOps and evaluate the impact of different modeling decisions on such challenges. Specifically, we perform a case study on two commonly studied AIOps applications: (1) predicting job failures based on trace data from a large-scale cluster environment and (2) predicting disk failures based on disk monitoring data from a large-scale cloud storage environment. First, we observe that the data leakage issue exists in AIOps solutions. Using a time-based splitting of training and validation datasets can significantly reduce such data leakage, making it more appropriate than using a random splitting in the AIOps context. Second, we show that AIOps solutions suffer from concept drift. Periodically updating AIOps models can help mitigate the impact of such concept drift, while the performance benefit and the modeling cost of increasing the update frequency depend largely on the application data and the used models. Our findings encourage future studies and practices on developing AIOps solutions to pay attention to their data-splitting decisions to handle the data leakage and concept drift challenges.

Publisher

Association for Computing Machinery (ACM)

Subject

Software

Link

https://dl.acm.org/doi/pdf/10.1145/3447876

Reference91 articles.

1. MIMETIC: Mobile encrypted traffic classification using multimodal deep learning

2. Is "better data" better than "better data miners"?

3. Software Engineering for Machine Learning: A Case Study

4. Predicting Disk Replacement towards Reliable Data Centers

Cited by 24 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Evidence of interrelated cognitive-like capabilities in large language models: Indications of artificial general intelligence or achievement?;Intelligence;2024-09

2. On the Model Update Strategies for Supervised Learning in AIOps Solutions;ACM Transactions on Software Engineering and Methodology;2024-08-26

3. The impact of concept drift and data leakage on log level prediction models;Empirical Software Engineering;2024-07-25

4. Estimating S-wave amplitude for earthquake early warning in New Zealand: Leveraging the first 3 seconds of P-Wave;Earth Science Informatics;2024-07-13

5. What causes exceptions in machine learning applications? Mining machine learning-related stack traces on Stack Overflow;Empirical Software Engineering;2024-07-03