The Diagnosis-Effective Sampling of Application Traces

Author:

Poghosyan Arnak12ORCID,Harutyunyan Ashot3ORCID,Davtyan Edgar4ORCID,Petrosyan Karen2ORCID,Baloian Nelson5ORCID

Affiliation:

1. Institute of Mathematics NAS Armenia, Yerevan 0019, Armenia

2. College of Science and Engineering, American University of Armenia, Yerevan 0019, Armenia

3. ML Laboratory, Yerevan State University, Yerevan 0025, Armenia

4. Picsart, Miami, FL 33009, USA

5. Department of Computer Science, University of Chile, Santiago 8330111, Chile

Abstract

Distributed tracing is cutting-edge technology used for monitoring, managing, and troubleshooting native cloud applications. It offers a more comprehensive and continuous observability, surpassing traditional logging methods, and is indispensable for navigating modern complex software architectures. However, the sheer volume of generated traces is staggering in distributed applications, and the direct storage and utilization of every trace is impractical due to associated operational costs. This entails a sampling strategy to select which traces warrant storage and analysis. Historically, sampling methods have included a rate-based approach, often relying heavily on a manual configuration. There is a need for a more intelligent approach, and we propose a hierarchical sampling methodology to address multiple requirements concurrently. Initial rate-based sampling mitigates the overwhelming volume of traces, as no further analysis can be performed on this level. In the next stage, more nuanced analysis is facilitated based on the previous foundation, incorporating information regarding trace properties and ensuring the preservation of vital process details even under extreme conditions. This comprehensive approach not only aids in the visualization and conceptualization of applications but also enables more targeted analysis in later stages. As we delve deeper into the sampling hierarchy, the technique becomes tailored to specific purposes, such as the simplification of application troubleshooting. In this context, the sampling strategy prioritizes the retention of erroneous traces from dominant processes, thus facilitating the identification and resolution of underlying issues. The focus of this paper is to reveal the impact of sampling on troubleshooting efficiency. Leveraging intelligent and explainable artificial intelligence solutions enables the detection of malfunctioning microservices and provides transparent insights into root causes. We advocate for using rule-induction systems, which offer explainability and efficacy in decision-making processes. By integrating advanced sampling techniques with machine-learning-driven intelligence, we empower organizations to navigate the complexities of large-scale distributed cloud environments effectively.

Funder

ADVANCE Research Grants from the Foundation for Armenian Science and Technology

Publisher

MDPI AG

Reference76 articles.

1. Parker, A., Spoonhower, D., Mace, J., Sigelman, B., and Isaacs, R. (2020). Distributed Tracing in Practice: Instrumenting, Analyzing, and Debugging Microservices, O’Reilly Media, Incorporated.

2. Shkuro, Y. (2019). Mastering Distributed Tracing: Analyzing Performance in Microservices and Complex Systems, Packt Publishing.

3. Opentracing (2021, January 26). What Is Distributed Tracing?. Available online: https://opentracing.io/docs/overview/what-is-tracing/.

4. A real-time trace-level toot-cause diagnosis system in Alibaba datacenters;Cai;IEEE Access,2019

5. Liu, D., He, C., Peng, X., Lin, F., Zhang, C., Gong, S., Li, Z., Ou, J., and Wu, Z. (2021, January 25–28). MicroHECL: High-Efficient Root Cause Localization in Large-Scale Microservice Systems. Proceedings of the 2021 IEEE/ACM 43rd International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP), Madrid, Spain.

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3