SystemML-Reference-Cited by-同舟云学术

SystemML

Published:2016-09 Issue:13 Volume:9 Page:1425-1436
ISSN:2150-8097
Container-title:Proceedings of the VLDB Endowment
language:en
Short-container-title:Proc. VLDB Endow.

Author:

Boehm Matthias¹,Dusenberry Michael W.²,Eriksson Deron²,Evfimievski Alexandre V.¹,Manshadi Faraz Makari¹,Pansare Niketan¹,Reinwald Berthold¹,Reiss Frederick R.³,Sen Prithviraj¹,Surve Arvind C.²,Tatikonda Shirish¹

Affiliation:

1. IBM Research --- Almaden

2. IBM Spark Technology Center

3. IBM Research --- Almaden and IBM Spark Technology Center

Abstract

The rising need for custom machine learning (ML) algorithms and the growing data sizes that require the exploitation of distributed, data-parallel frameworks such as MapReduce or Spark, pose significant productivity challenges to data scientists. Apache SystemML addresses these challenges through declarative ML by (1) increasing the productivity of data scientists as they are able to express custom algorithms in a familiar domain-specific language covering linear algebra primitives and statistical functions, and (2) transparently running these ML algorithms on distributed, data-parallel frameworks by applying cost-based compilation techniques to generate efficient, low-level execution plans with in-memory single-node and large-scale distributed operations. This paper describes SystemML on Apache Spark, end to end, including insights into various optimizer and runtime techniques as well as performance characteristics. We also share lessons learned from porting SystemML to Spark and declarative ML in general. Finally, SystemML is open-source, which allows the database community to leverage it as a testbed for further research.

Publisher

VLDB Endowment

Subject

General Earth and Planetary Sciences,Water Science and Technology,Geography, Planning and Development

Link

https://dl.acm.org/doi/pdf/10.14778/3007263.3007279

Cited by 138 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. AdaptMD: Balancing Space and Performance in NUMA Architectures With Adaptive Memory Deduplication;IEEE Transactions on Computers;2024-06

2. Compression and In-Situ Query Processing for Fine-Grained Array Lineage;2024 IEEE 40th International Conference on Data Engineering (ICDE);2024-05-13

3. DMRNet: Effective Network for Accurate Discharge Medication Recommendation;2024 IEEE 40th International Conference on Data Engineering (ICDE);2024-05-13

4. Blaze: Holistic Caching for Iterative Data Processing;Proceedings of the Nineteenth European Conference on Computer Systems;2024-04-22

5. InferDB: In-Database Machine Learning Inference Using Indexes;Proceedings of the VLDB Endowment;2024-04