Author:
He Hongxia,Li Xi,Chen Peng,Chen Juan,Liu Ming,Wu Lei
Abstract
AbstractCloud environment is a virtual, online, and distributed computing environment that provides users with large-scale services. And cloud monitoring plays an integral role in protecting infrastructures in the cloud environment. Cloud monitoring systems need to closely monitor various KPIs of cloud resources, to accurately detect anomalies. However, due to the complexity and highly dynamic nature of the cloud environment, anomaly detection for these KPIs with various patterns and data quality is a huge challenge, especially those massive unlabeled data. Besides, it’s also difficult to improve the accuracy of the existing anomaly detection methods. To solve these problems, we propose a novel Dynamic Graph Transformer based Parallel Framework (DGT-PF) for efficiently detect system anomalies in cloud infrastructures, which utilizes Transformer with anomaly attention mechanism and Graph Neural Network (GNN) to learn the spatio-temporal features of KPIs to improve the accuracy and timeliness of model anomaly detection. Specifically, we propose an effective dynamic relationship embedding strategy to dynamically learn spatio-temporal features and adaptively generate adjacency matrices, and soft cluster each GNN layer through Diffpooling module. In addition, we also use nonlinear neural network model and AR-MLP model in parallel to obtain better detection accuracy and improve detection performance. The experiment shows that the DGT-PF framework have achieved the highest F1-Score on 5 public datasets, with an average improvement of 21.6% compared to 11 anomaly detection models.
Funder
the National Natural Science Foundation under Grant
Science and Technology Program of Sichuan Province under Grant
Publisher
Springer Science and Business Media LLC
Reference51 articles.
1. Cid-Fuentes JA, Szabo C, Falkner K (2018) Adaptive performance anomaly detection in distributed systems using online svms. IEEE Trans Dependable Secure Comput 17(5):928–941
2. Xu H, Chen W, Zhao N, Li Z, Bu J, Li Z, Liu Y, Zhao Y, Pei D, Feng Y, Chen J, Wang Z, Qiao H (2018) Unsupervised anomaly detection via variational auto-encoder for seasonal kpis in web applications. The Web Conference 2018-Proceedings of the World Wide Web Conference, WWW 2018, France, p 187–196. https://doi.org/10.1145/3178876.3185996.
3. Long T, Chen P, Xia Y, Ma Y, Sun X, Zhao J, Lyu Y (2024) A deep deterministic policy gradient-based method for enforcing service fault-tolerance in mec. Chin J Electron 34:1–11
4. Li Z, Lu Q, Zhu L, Xu X, Liu Y, Zhang W (2018) An empirical study of cloud api issues. IEEE Cloud Comput 5(2):58–72
5. Chen J, Chen P, Niu X, Wu Z, Xiong L, Shi C (2022) Task offloading in hybrid-decision-based multi-cloud computing network: a cooperative multi-agent deep reinforcement learning. J Cloud Comput 11(1):1–17
Cited by
10 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献