Author:
Chen Juan,Zhang Rui,Chen Peng,Ren Jianhua,Wu Zongling,Wang Yang,Li Xi,Xiong Ling
Abstract
AbstractThe rapid advancement of microservice architecture in the cloud has led to the necessity of effectively detecting, classifying, and diagnosing run failures in microservice applications. Due to the high dynamics of cloud environments and the complex dependencies between microservices, it is challenging to achieve robust real-time system fault identification. This paper proposes an interpretable fault diagnosis framework tailored for microservice architecture, namely Multi-scale Learnable Transformation Graph for Fault Classification and Diagnosis(MTG_CD). Firstly, we employ multi-scale neural transformation and graph structure adjacency matrix learning to enhance data diversity while extracting temporal-structural features from system monitoring metrics Secondly, a graph convolutional network (GCN) is utilized to fuse the extracted temporal-structural features in a multi-feature modeling approach, which helps to improve the accuracy of anomaly detection. To identify the root cause of system faults, we finally conduct a coarse-grained level diagnosis and exploration after obtaining the results of classifying the fault data. We evaluate the performance of MTG_CD on the microservice benchmark SockShop, demonstrating its superiority over several baseline methods in detecting CPU usage overhead, memory leak, and network delay faults. The average macro F1 score improves by 14.05%.
Funder
Science and Technology Department of Sichuan Province
Ministry of Education Program
Publisher
Springer Science and Business Media LLC