Affiliation:
1. Nankai University, China
2. Alibaba (Beijing) Software Services Co., Ltd., China
3. AI Application Research Center, Huawei Technologies Co., China
4. Tsinghua University, China
Abstract
Accurate and efficient localization of root cause instances in large-scale microservice systems is of paramount importance. Unfortunately, prevailing methods face several limitations. Notably, some recent methods rely on supervised learning which necessitates a substantial amount of labeled data. However, labeling root cause instances is time-consuming and laborious, especially with multiple modalities of data including logs, traces, metrics, etc. Moreover, some approaches favor deep learning for localization but lack interpretability and continuous improvement mechanisms.
To address the above challenges, we propose DeepHunt, a novel root cause localization method based on multimodal data analysis. Firstly, DeepHunt introduces Root Cause Score (RCS) by integrating reconstruction errors and failure propagation patterns (upstream-downstream relationships), imparting interpretability to the localization of root causes. Then, it embraces Graph Autoencoder (GAE) to address the limitation imposed by scarce labeled data. It employs data augmentation to mitigate the adverse effects of insufficient historical training samples. We evaluate DeepHunt on two open-source datasets, and it outperforms existing methods when facing a zero-label cold start. DeepHunt can be further improved by continuously fine-tuning through a feedback mechanism.
Publisher
Association for Computing Machinery (ACM)
Reference60 articles.
1. Practical Approach to Asynchronous Multivariate Time Series Anomaly Detection and Localization
2. Jinwon An and Sungzoon Cho. 2015. Variational autoencoder based anomaly detection using reconstruction probability. Special lecture on IE 2, 1 (Dec. 2015), 1–18. https://api.semanticscholar.org/CorpusID:36663713
3. USAD
4. AWS. 2021. Summary of the AWS Service Event in the Northern Virginia (US-EAST-1) Region. https://aws.amazon.com/cn/message/11201/
5. Leo Breiman. 2001. Random forests. Machine learning 45 (2001), 5–32.