Author:
BENNETT ANDREW J.,KELLY PAUL H. J.,PATERSON ROSS A.
Abstract
This paper is an exploration of the parallel graph reduction approach to parallel functional
programming, illustrated by a particular example: pipelined, dynamically-scheduled implementation
of search, updates and read-modify-write transactions on an in-store binary search
tree. We use program transformation, execution-driven simulation and analytical modelling
to expose the maximum potential parallelism, the minimum communication and synchronisation
overheads, and to control the overall space requirement. We begin with a lazy functional
program specifying a series of transactions on a binary tree, each involving several searches
and updates, in a side-effect-free fashion. Transformation of the source code produces a
formulation of the program with greater locality and larger grain size than can be achieved
using naive parallelization methods, and we show that, with care, these tasks can be scheduled
effectively. Even with a workload using random keys, significant spatial locality is found, and
we evaluate a modified cache coherency protocol which avoids false sharing so that large
cache lines can be used to minimise the number of messages required. As expected with
a pipeline, the application should reach a steady state as soon as the first transaction is
completed. However, if the network latency is too large, the rate of completion lags behind
the rate at which work is admitted, and internal queues grow without bound. We determine
the conditions under which this occurs, and show how it can be avoided while maximising
speedup.
Publisher
Cambridge University Press (CUP)