Distributed Storage Systems for Data Intensive Computing
-
Published:
Issue:
Volume:
Page:95-117
-
ISSN:2327-3453
-
Container-title:Advances in Systems Analysis, Software Engineering, and High Performance Computing
-
language:
-
Short-container-title:
Author:
Vazhkudai Sudharshan S.1, Butt Ali R.2, Ma Xiaosong3
Affiliation:
1. Oak Ridge National Laboratory, USA 2. Virginia Polytechnic Institute and State University, USA 3. North Carolina State University, USA
Abstract
In this chapter, the authors present an overview of the utility of distributed storage systems in supporting modern applications that are increasingly becoming data intensive. Their coverage of distributed storage systems is based on the requirements imposed by data intensive computing and not a mere summary of storage systems. To this end, they delve into several aspects of supporting data-intensive analysis, such as data staging, offloading, checkpointing, and end-user access to terabytes of data, and illustrate the use of novel techniques and methodologies for realizing distributed storage systems therein. The data deluge from scientific experiments, observations, and simulations is affecting all of the aforementioned day-to-day operations in data-intensive computing. Modern distributed storage systems employ techniques that can help improve application performance, alleviate I/O bandwidth bottleneck, mask failures, and improve data availability. They present key guiding principles involved in the construction of such storage systems, associated tradeoffs, design, and architecture, all with an eye toward addressing challenges of data-intensive scientific applications. They highlight the concepts involved using several case studies of state-of-the-art storage systems that are currently available in the data-intensive computing landscape.
Reference74 articles.
1. Adya, A., Bolosky, W. J., Castro, M., Cermak, G., Chaiken, R., & Douceur, J. R. … Wattenhofer, R.P. (2002). FARSITE: Federated, available, and reliable storage for an incompletely trusted environment. In Proceedings 5th USENIX OSDI, (pp. 1–14). 2. Serverless network file systems 3. Bent, J., Thain, D., Arpaci-Dusseau, A. C., Arpaci-Dusseau, R. H., & Livny, M. (2004). Explicit control in a batch-aware distributed file system. In Proceedings of the 1st USENIX NSDI, (pp. 365–378). 4. Bester, J., Foster, I., Kesselman, C., Tedesco, J., & Tuecke, S. (1999). GASS: A data movement and access service for wide area computing systems. In Proceedings of the Sixth Workshop on I/O in Parallel and Distributed Systems. 5. Butt, A., Johnson, T., Zheng, Y., & Hu, Y. (2004). Kosha: A peer-to-peer enhancement for the network file system. In Proceedings of Supercomputing Conference.
|
|