Hybrid parallelization strategies for large-scale machine learning in SystemML-Reference-Cited by-同舟云学术

Hybrid parallelization strategies for large-scale machine learning in SystemML

Published:2014-03 Issue:7 Volume:7 Page:553-564
ISSN:2150-8097
Container-title:Proceedings of the VLDB Endowment
language:en
Short-container-title:Proc. VLDB Endow.

Author:

Boehm Matthias¹,Tatikonda Shirish¹,Reinwald Berthold¹,Sen Prithviraj¹,Tian Yuanyuan¹,Burdick Douglas R.¹,Vaithyanathan Shivakumar¹

Affiliation:

1. IBM Research, Almaden, San Jose, CA

Abstract

SystemML aims at declarative, large-scale machine learning (ML) on top of MapReduce, where high-level ML scripts with R-like syntax are compiled to programs of MR jobs. The declarative specification of ML algorithms enables---in contrast to existing large-scale machine learning libraries---automatic optimization. SystemML's primary focus is on data parallelism but many ML algorithms inherently exhibit opportunities for task parallelism as well. A major challenge is how to efficiently combine both types of parallelism for arbitrary ML scripts and workloads. In this paper, we present a systematic approach for combining task and data parallelism for large-scale machine learning on top of MapReduce. We employ a generic Parallel FOR construct (ParFOR) as known from high performance computing (HPC). Our core contributions are (1) complementary parallelization strategies for exploiting multi-core and cluster parallelism, as well as (2) a novel cost-based optimization framework for automatically creating optimal parallel execution plans. Experiments on a variety of use cases showed that this achieves both efficiency and scalability due to automatic adaptation to ad-hoc workloads and unknown data characteristics.

Publisher

VLDB Endowment

Subject

General Earth and Planetary Sciences,Water Science and Technology,Geography, Planning and Development

Link

https://dl.acm.org/doi/pdf/10.14778/2732286.2732292

Cited by 58 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Database Native Model Selection: Harnessing Deep Neural Networks in Database Systems;Proceedings of the VLDB Endowment;2024-01

2. Layer-wise partitioning and merging for efficient and scalable deep learning;Future Generation Computer Systems;2023-12

3. SAGA: A Scalable Framework for Optimizing Data Cleaning Pipelines for Machine Learning Applications;Proceedings of the ACM on Management of Data;2023-11-13

4. BladeDISC: Optimizing Dynamic Shape Machine Learning Workloads via Compiler Approach;Proceedings of the ACM on Management of Data;2023-11-13

5. A systematic evaluation of machine learning on serverless infrastructure;The VLDB Journal;2023-09-20