Affiliation:
1. Univ. of Cambridge, Cambridge, UK
2. Univ. of Calgary, Calgary, Alberta, Canada
Abstract
The monitoring of distributed systems involves the collection, interpretation, and display of information concerning the interactions among concurrently executing processes. This information and its display can support the debugging, testing, performance evaluation, and dynamic documentation of distributed systems. General problems associated with monitoring are outlined in this paper, and the architecture of a general purpose, extensible, distributed monitoring system is presented. Three approaches to the display of process interactions are described: textual traces, animated graphical traces, and a combination of aspects of the textual and graphical approaches. The roles that each of these approaches fulfill in monitoring and debugging distributed systems are identified and compared. Monitoring tools for collecting communication statistics, detecting deadlock, controlling the non-deterministic execution of distributed systems, and for using protocol specifications in monitoring are also described.
Our discussion is based on experience in the development and use of a monitoring system within a distributed programming environment called Jade. Jade was developed within the Computer Science Department of the University of Calgary and is now being used to support teaching and research at a number of university and research organizations.
Publisher
Association for Computing Machinery (ACM)
Cited by
106 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Introduction to Next‐Generation Internet and Distributed Systems;Decentralized Systems and Distributed Computing;2024-07-15
2. Achieving Observability on Fog Computing with the Use of Open-Source Tools;Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering;2024
3. Global Message Ordering using Distributed Kafka Clusters;2023 15th International Conference on Innovations in Information Technology (IIT);2023-11-14
4. Concurrent runtime verification of data rich events;International Journal on Software Tools for Technology Transfer;2023-06-26
5. The view on systems monitoring and its requirements from future Cloud-to-Thing applications and infrastructures;Future Generation Computer Systems;2023-04