The Diagnosis-Effective Sampling of Application Traces
-
Published:2024-07-02
Issue:13
Volume:14
Page:5779
-
ISSN:2076-3417
-
Container-title:Applied Sciences
-
language:en
-
Short-container-title:Applied Sciences
Author:
Poghosyan Arnak12ORCID, Harutyunyan Ashot3ORCID, Davtyan Edgar4ORCID, Petrosyan Karen2ORCID, Baloian Nelson5ORCID
Affiliation:
1. Institute of Mathematics NAS Armenia, Yerevan 0019, Armenia 2. College of Science and Engineering, American University of Armenia, Yerevan 0019, Armenia 3. ML Laboratory, Yerevan State University, Yerevan 0025, Armenia 4. Picsart, Miami, FL 33009, USA 5. Department of Computer Science, University of Chile, Santiago 8330111, Chile
Abstract
Distributed tracing is cutting-edge technology used for monitoring, managing, and troubleshooting native cloud applications. It offers a more comprehensive and continuous observability, surpassing traditional logging methods, and is indispensable for navigating modern complex software architectures. However, the sheer volume of generated traces is staggering in distributed applications, and the direct storage and utilization of every trace is impractical due to associated operational costs. This entails a sampling strategy to select which traces warrant storage and analysis. Historically, sampling methods have included a rate-based approach, often relying heavily on a manual configuration. There is a need for a more intelligent approach, and we propose a hierarchical sampling methodology to address multiple requirements concurrently. Initial rate-based sampling mitigates the overwhelming volume of traces, as no further analysis can be performed on this level. In the next stage, more nuanced analysis is facilitated based on the previous foundation, incorporating information regarding trace properties and ensuring the preservation of vital process details even under extreme conditions. This comprehensive approach not only aids in the visualization and conceptualization of applications but also enables more targeted analysis in later stages. As we delve deeper into the sampling hierarchy, the technique becomes tailored to specific purposes, such as the simplification of application troubleshooting. In this context, the sampling strategy prioritizes the retention of erroneous traces from dominant processes, thus facilitating the identification and resolution of underlying issues. The focus of this paper is to reveal the impact of sampling on troubleshooting efficiency. Leveraging intelligent and explainable artificial intelligence solutions enables the detection of malfunctioning microservices and provides transparent insights into root causes. We advocate for using rule-induction systems, which offer explainability and efficacy in decision-making processes. By integrating advanced sampling techniques with machine-learning-driven intelligence, we empower organizations to navigate the complexities of large-scale distributed cloud environments effectively.
Funder
ADVANCE Research Grants from the Foundation for Armenian Science and Technology
Reference76 articles.
1. Parker, A., Spoonhower, D., Mace, J., Sigelman, B., and Isaacs, R. (2020). Distributed Tracing in Practice: Instrumenting, Analyzing, and Debugging Microservices, O’Reilly Media, Incorporated. 2. Shkuro, Y. (2019). Mastering Distributed Tracing: Analyzing Performance in Microservices and Complex Systems, Packt Publishing. 3. Opentracing (2021, January 26). What Is Distributed Tracing?. Available online: https://opentracing.io/docs/overview/what-is-tracing/. 4. A real-time trace-level toot-cause diagnosis system in Alibaba datacenters;Cai;IEEE Access,2019 5. Liu, D., He, C., Peng, X., Lin, F., Zhang, C., Gong, S., Li, Z., Ou, J., and Wu, Z. (2021, January 25–28). MicroHECL: High-Efficient Root Cause Localization in Large-Scale Microservice Systems. Proceedings of the 2021 IEEE/ACM 43rd International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP), Madrid, Spain.
|
|