Affiliation:
1. Monash University, Australia
Abstract
Wide-area distributed systems offer new opportunities for executing large-scale scientific applications. On these systems, communication mechanisms have to deal with dynamic resource availability and the potential for resource and network failures. Connectivity losses can affect the execution of workflow applications, which require reliable data transport between components. We present the design and implementation of p-channels, an asynchronous and fault-tolerant pipe mechanism suitable for coupling workflow components. Fault-tolerant communication is made possible by persistence, through adaptive caching of pipe segments while providing direct data streaming. We present the distributed algorithm for implementing: (a) caching of pipe data segments; (b) asynchronous read operation; and (c) communication state transfer to handle dynamic process joins and leaves.
Reference36 articles.
1. Abramson, D., Foster, I., Giddy, J., Lewis, A., Sosic, R., Sutherst, R., & White, N. (1997). Nimrod Computational Workbench: A Case Study in Desktop Metacomputing. In Australian Computer Science Conference (ACSC 97). Macquarie University, Sydney.
2. Abramson, D., & Kommineni, J. (2004). A Flexible IO Scheme for Grid Workflows. In Proc. of the 18th International Parallel and Distributed Processing Symposium. Krakow, Poland.
3. Abramson, D., Sosic, R., Giddy, J., & Hall, B. (1995). Nimrod: A Tool for Performing Parameterised Simulations using Distributed Workstations. In Proc. of the 4th IEEE Symposium on High Performance Distributed Computing. Virginia. IEEE Press.
4. MPI/FT: A Model-Based Approach to Low-Overhead Fault Tolerant Message-Passing Middleware
5. MPICH-V Project: A Multiprotocol Automatic Fault-Tolerant MPI