Combining Distributed and Kernel Tracing for Performance Analysis of Cloud Applications

Author:

Gelle Loïc,Ezzati-Jivan NaserORCID,Dagenais Michel R.ORCID

Abstract

Distributed tracing allows tracking user requests that span across multiple services and machines in a distributed application. However, typical cloud applications rely on abstraction layers that can hide the root cause of latency happening between processes or in the kernel. Because of its focus on high-level events, existing methodologies in applying distributed tracing can be limited when trying to detect complex contentions and relate them back to the originating requests. Cross-level analyses that include kernel-level events are necessary to debug problems as prevalent as mutex or disk contention, however cross-level analysis and associating events in the kernel and distributed tracing data is complex and can add a lot of overhead. This paper describes a new solution for combining distributed tracing with low-level software tracing in order to find the latency root cause better. We explain how we achieve a hybrid trace collection to capture and synchronize both kernel and distributed request events. Then, we present our design and implementation for a critical path analysis. We show that our analysis describes precisely how each request spends its time and what stands in its critical path while limiting overhead.

Publisher

MDPI AG

Subject

Electrical and Electronic Engineering,Computer Networks and Communications,Hardware and Architecture,Signal Processing,Control and Systems Engineering

Reference43 articles.

1. The lttng tracer: A low impact performance and behavior monitor for gnu/linux;Desnoyers,2006

2. Logs and Tracing

3. Boosting the performance of computing systems through adaptive configuration tuning

4. Chaos Engineering

Cited by 6 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Vnode: Low-Overhead Transparent Tracing of Node.js-Based Microservice Architectures;Future Internet;2023-12-29

2. Transparent Trace Annotation for Performance Debugging in Microservice-oriented Systems (Work In Progress Paper);Companion of the 2023 ACM/SPEC International Conference on Performance Engineering;2023-04-15

3. Message flow analysis with complex causal links for distributed ROS 2 systems;Robotics and Autonomous Systems;2023-03

4. Dynamic Application Call Graph Formation and Service Identification in Cloud Data Centers;IEEE Transactions on Network and Service Management;2023-03

5. Provenance-enhanced Root Cause Analysis for Jupyter Notebooks;2022 IEEE/ACM 15th International Conference on Utility and Cloud Computing (UCC);2022-12

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3