Affiliation:
1. Kyushu University, Japan
Abstract
This chapter describes a study on workflow scheduling with fault tolerance. It starts with an understanding on workflow scheduling and fault tolerance technologies independently. Next, the chapter surveys the related works on the combination field of workflow scheduling and fault tolerance technologies. Generally, these works are classified into six categories corresponding to the six fault tolerance technologies: workflow scheduling with primary/backup, primary/backup with multiple backups, checkpoint, rescheduling, active replication, and active replication with dynamic replicas. An in-depth study on these six topics illustrates the challenge issues explored so far, e.g. overloading conditions, tradeoffs among scheduling criteria, et cetera, and some future research directions are also identified. As applications are increasingly complex, and failures become a severe problem in the large scale systems, the authors expect to provide a comprehensive review on the problem of workflow scheduling with fault tolerance through this work.
Reference59 articles.
1. Abawajy, J. H. (2004). Fault-tolerant scheduling policy for grid computing systems. Proceedings of 18th International Parallel and Distributed Processing Symposium, (pp. 238-244). doi: 10.1109/IPDPS.2004.1303290
2. Efficient overloading techniques for primary-backup scheduling in real-time systems
3. Balasubramanian, J., Gokhale, A., Wolf, F., Dubey, A., Lu, C., Gill, C., & Schmidt, D. C. (2009). Resource-aware deployment and configuration of fault-tolerant real-time systems. ISIS Technical Report ISIS-09-109. Retrieved from http://www.isis.vanderbilt.edu/node/4121
4. Fault-tolerance and high availability in data stream management systems;M.Balazinska;Encyclopedia of database systems,2009
5. Beitollahi, H., & Deconinck, G. (2006). Fault-tolerant partitioning scheduling algorithms in real-time multiprocessor systems. Proceedings of the 12th Pacific Rim International Symposium on Dependable Computing, (pp. 296-304). doi: 10.1109/PRDC.2006.34.