Affiliation:
1. Sun Yat-sen University, Guangzhou, China
2. Huawei, Shenzhen, China
3. Huawei, Beijing, China
Abstract
Distributed tracing has been widely adopted in many microservice systems and plays an important role in monitoring and analyzing the system. However, trace data often come in large volumes, incurring substantial computational and storage costs. To reduce the quantity of traces, trace sampling has become a prominent topic of discussion, and several methods have been proposed in prior work. To attain higher-quality sampling outcomes, biased sampling has gained more attention compared to random sampling. Previous biased sampling methods primarily considered the importance of traces based on diversity, aiming to sample more edge-case traces and fewer common-case traces. However, we contend that relying solely on trace diversity for sampling is insufficient, system runtime state is another crucial factor that needs to be considered, especially in cases of system failures. In this study, we introduce TraStrainer, an online sampler that takes into account both system runtime state and trace diversity. TraStrainer employs an interpretable and automated encoding method to represent traces as vectors. Simultaneously, it adaptively determines sampling preferences by analyzing system runtime metrics. When sampling, it combines the results of system-bias and diversity-bias through a dynamic voting mechanism. Experimental results demonstrate that TraStrainer can achieve higher quality sampling results and significantly improve the performance of downstream root cause analysis (RCA) tasks. It has led to an average increase of 32.63% in Top-1 RCA accuracy compared to four baselines in two datasets.
Funder
Guangdong Basic and Applied Basic Research Foundation
Publisher
Association for Computing Machinery (ACM)
Reference57 articles.
1. 2023. Kubernetes Homepage. http://kubernetes.io/ [Online]
2. 2023. Zipkin Homepage. https://zipkin.io [Online]
3. Debugging distributed systems
4. Chaosblade. 2023. Chaosblade. https://github.com/chaosblade-io/chaosblade Accessed Jan. 6, 2023
5. Time-Series Forecasting