RT-ZooKeeper: Taming the Recovery Latency of a Coordination Service-Reference-Cited by-同舟云学术

RT-ZooKeeper: Taming the Recovery Latency of a Coordination Service

Published:2021-10-31 Issue:5s Volume:20 Page:1-22
ISSN:1539-9087
Container-title:ACM Transactions on Embedded Computing Systems
language:en
Short-container-title:ACM Trans. Embed. Comput. Syst.

Author:

Li Haoran¹,Lu Chenyang¹,Gill Christopher D.¹

Affiliation:

1. Cyber-Physical SystemsLaboratory, Washington University in St. Louis, St. Louis, MO, USA

Abstract

Fault-tolerant coordination services have been widely used in distributed applications in cloud environments. Recent years have witnessed the emergence of time-sensitive applications deployed in edge computing environments, which introduces both challenges and opportunities for coordination services. On one hand, coordination services must recover from failures in a timely manner. On the other hand, edge computing employs local networked platforms that can be exploited to achieve timely recovery. In this work, we first identify the limitations of the leader election and recovery protocols underlying Apache ZooKeeper, the prevailing open-source coordination service. To reduce recovery latency from leader failures, we then design RT-Zookeeper with a set of novel features including a fast-convergence election protocol, a quorum channel notification mechanism, and a distributed epoch persistence protocol. We have implemented RT-Zookeeper based on ZooKeeper version 3.5.8. Empirical evaluation shows that RT-ZooKeeper achieves 91% reduction in maximum recovery latency in comparison to ZooKeeper. Furthermore, a case study demonstrates that fast failure recovery in RT-ZooKeeper can benefit a common messaging service like Kafka in terms of message latency.

Funder

NSF

Fullgraf Foundation

Publisher

Association for Computing Machinery (ACM)

Subject

Hardware and Architecture,Software

Link

https://dl.acm.org/doi/pdf/10.1145/3477034

Reference39 articles.

Cited by 12 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Neural architecture search for image super-resolution: A review on the emerging state-of-the-art;Neurocomputing;2024-12

2. Fasor: A Fast Tensor Program Optimization Framework for Efficient DNN Deployment;Proceedings of the 38th ACM International Conference on Supercomputing;2024-05-30

3. Neural architecture search for in-memory computing-based deep learning accelerators;Nature Reviews Electrical Engineering;2024-05-20

4. AlterEgo;Proceedings of the 7th International Workshop on Edge Systems, Analytics and Networking;2024-04-22

5. AutoML: A systematic review on automated machine learning with neural architecture search;Journal of Information and Intelligence;2024-01