Affiliation:
1. Universitat Rovira i Virgili, Tarragona, Spain
2. Télécom SudParis, Palaiseau, France
3. Universitat Rovira i Virgili, Tarragona, Spain and IBM T.J. Watson Research Center, Yorktown Heights, NY
Abstract
Serverless computing greatly simplifies the use of cloud resources. In particular, Function-as-a-Service (FaaS) platforms enable programmers to develop applications as individual functions that can run and scale independently. Unfortunately, applications that require fine-grained support for mutable state and synchronization, such as machine learning (ML) and scientific computing, are notoriously hard to build with this new paradigm. In this work, we aim at bridging this gap. We present
Crucial
, a system to program highly-parallel stateful serverless applications.
Crucial
retains the simplicity of serverless computing. It is built upon the key insight that FaaS resembles to concurrent programming at the scale of a datacenter. Accordingly, a distributed shared memory layer is the natural answer to the needs for fine-grained state management and synchronization.
Crucial
allows to port effortlessly a multi-threaded code base to serverless, where it can benefit from the scalability and pay-per-use model of FaaS platforms. We validate
Crucial
with the help of micro-benchmarks and by considering various stateful applications. Beyond classical parallel tasks (e.g., a Monte Carlo simulation), these applications include representative ML algorithms such as
k
-means and logistic regression. Our evaluation shows that
Crucial
obtains superior or comparable performance to Apache Spark at similar cost (18%–40% faster). We also use
Crucial
to port (part of) a state-of-the-art multi-threaded ML library to serverless. The ported application is up to 30% faster than with a dedicated high-end server. Finally, we attest that
Crucial
can rival in performance with a single-machine, multi-threaded implementation of a complex coordination problem. Overall,
Crucial
delivers all these benefits with less than 6% of changes in the code bases of the evaluated applications.
Funder
EU Horizon 2020 programme
Spanish Government
Publisher
Association for Computing Machinery (ACM)
Reference105 articles.
1. 2016. Apache OpenWhisk is a serverless open source cloud platform. Retrieved September 2021 from https://openwhisk.apache.org/.
2. 2016. Kubeless. Retrieved September 2021 from https://kubeless.io/.
3. 2016. OpenFaaS. Retrieved September 2021 from https://www.openfaas.com/.
4. 2016. Serverless Functions for Kubernetes - Fission. Retrieved September 2021 from https://fission.io/.
5. 2019. lambda-maven-plugin. Retrieved September 2021 from https://github.com/SeanRoy/lambda-maven-plugin.
Cited by
14 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Orchestration and Management of Adaptive IoT-Centric Distributed Applications;IEEE Internet of Things Journal;2024-02-01
2. MLLess: Achieving cost efficiency in serverless machine learning training;Journal of Parallel and Distributed Computing;2024-01
3. A Survey of Actor-Like Programming Models for Serverless Computing;Lecture Notes in Computer Science;2024
4. Glider;Proceedings of the 24th International Middleware Conference on ZZZ;2023-11-27
5. SPIRT: A Fault-Tolerant and Reliable Peer-to-Peer Serverless ML Training Architecture;2023 IEEE 23rd International Conference on Software Quality, Reliability, and Security (QRS);2023-10-22