Fault tolerance under UNIX-Reference-Cited by-同舟云学术

Fault tolerance under UNIX

Published:1989-01 Issue:1 Volume:7 Page:1-24
ISSN:0734-2071
Container-title:ACM Transactions on Computer Systems
language:en
Short-container-title:ACM Trans. Comput. Syst.

Author:

Borg Anita¹,Blau Wolfgang²,Graetsch Wolfgang³,Herrmann Ferdinand³,Oberle Wolfgang³

Affiliation:

1. Digital Equipment Corp., Palo Alto, CA

2. Tandem Computers GmbH, Frankfurt, W. Germany

3. Nixdorf Computer GmbH, Paderborn, W. Germany

Abstract

The initial design for a distributed, fault-tolerant version of UNIX based on three-way atomic message transmission was presented in an earlier paper [3]. The implementation effort then moved from Auragen Systems 1 to Nixdorf Computer where it was completed. This paper describes the working system, now known as the TARGON/32. The original design left open questions in at least two areas: fault tolerance for server processes and recovery after a crash were briefly and inaccurately sketched, rebackup after recovery was not discussed at all. The fundamental design involving three-way message transmission has remained unchanged. However, in addition to important changes in the implementation, server backup has been redesigned and is now more consistent with that of normal user processes. Recovery and rebackup have been completed in a less centralized and thus more efficient manner than previously envisioned. In this paper we review important aspects of the original design and note how the implementation differs from our original ideas. We then focus on the backup and recovery for server processes and the changes and additions in the design and implementation of recovery and rebackup.

Publisher

Association for Computing Machinery (ACM)

Subject

General Computer Science

Link

https://dl.acm.org/doi/pdf/10.1145/58564.58565

Reference15 articles.

1. ARNOW D. AND GLAZER S. A fast safe file system for UNIX. Unpublished paper written in 1984 for Auragen Systems Corp. Ft. Lee N.J. ARNOW D. AND GLAZER S. A fast safe file system for UNIX. Unpublished paper written in 1984 for Auragen Systems Corp. Ft. Lee N.J.

2. The Recovery Manager of the System R Database Manager

3. Highly available systems for database applications

Cited by 87 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Logging and Checkpointing;From Traditional Fault Tolerance to Blockchain;2021-05-19

2. Pervasive System Overview;Intelligent Systems Reference Library;2019-09-20

3. Fast Failure Recovery in Vertex-Centric Distributed Graph Processing Systems;IEEE Transactions on Knowledge and Data Engineering;2019-04-01

4. C-RAM: Breaking Mobile Device Memory Barriers Using the Cloud;IEEE Transactions on Mobile Computing;2016-11-01

5. Fast failure recovery in distributed graph processing systems;Proceedings of the VLDB Endowment;2014-12