Affiliation:
1. University of Oran1, Algeria
2. University of Es Senia Oran1, Algeria
Abstract
One of the most important points for more effective use in the environment of cloud is undoubtedly the study of reliability and robustness of services related to this environment. In this case, fault tolerance is necessary to ensure that reliability and reduce the SLA violation. Checkpointing is a popular fault tolerance technique in large-scale systems. However, its major disadvantage is the overhead caused by the storage time of checkpointing files, which increases the execution time and minimizes the possibility to meet the desired deadlines. In this chapter, the authors propose a checkpointing strategy with lightweight storage. The storage is provided by creating a virtual topology VRbIO and the use of an intelligent and fault tolerant I/O technique CSDS (collective and selective data sieving). The proposal is executed by active and reactive agents and it solves many problems of checkpointing with standard I/O. To evaluate the approach, the authors compare it with a checkpointing with ROMIO as I/O strategy. Experimental results show the effectiveness and reliability of the proposed approach.
Reference14 articles.
1. Arockiam, L. & Geo Francis E. (2012). FTM-A Middle Layer Architecture for Fault Tolerance in Cloud Computing. ICNICT, (2), 12-16.
2. CloudSim: a toolkit for modeling and simulation of cloud computing environments and evaluation of resource provisioning algorithms
3. Improving Parallel I/O Performance with Data Layout Awareness
4. Efficient System-Level Remote Checkpointing Technique for BLCR
5. Del Rosario, J., Bordawekar, R., & Choudhary, A. (1993). Improved parallel I/O via a two-phase runtime access strategy. ACM SIGARCH Computer Architecture News, 21(5), 31-38.