Affiliation:
1. Microsoft Research, Redmond
Abstract
The Farsite file system is a storage service that runs on the desktop computers of a large organization and provides the semantics of a central NTFS file server. The motivation behind the Farsite project was to harness the unused storage and network resources of desktop computers to provide a service that is reliable, available, and secure despite the fact that it runs on machines that are unreliable, often unavailable, and of limited security. A main premise of the project has been that building a scalable system requires more than scalable algorithms: To be scalable in a practical sense, a distributed system targeting 10
5
nodes must tolerate a significant (and never-zero) rate of machine failure, a small number of malicious participants, and a substantial number of opportunistic participants. It also must automatically adapt to the arrival and departure of machines and changes in machine availability, and it must be able to autonomically repartition its data and metadata as necessary to balance load and alleviate hotspots. We describe the history of the project, including its multiple versions of major system components, the unique programming style and software-engineering environment we created to facilitate development, our distributed debugging framework, and our experiences with formal system specification. We also report on the lessons we learned during this development.
Publisher
Association for Computing Machinery (ACM)
Cited by
16 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. I4;Proceedings of the 27th ACM Symposium on Operating Systems Principles;2019-10-27
2. Asphalion: trustworthy shielding against Byzantine faults;Proceedings of the ACM on Programming Languages;2019-10-10
3. Towards Automatic Inference of Inductive Invariants;Proceedings of the Workshop on Hot Topics in Operating Systems;2019-05-13
4. Velisarios: Byzantine Fault-Tolerant Protocols Powered by Coq;Programming Languages and Systems;2018
5. EventML: Specification, verification, and implementation of crash-tolerant state machine replication systems;Science of Computer Programming;2017-11