FlashR-Reference-Cited by-同舟云学术

FlashR

Published:2018-03-23 Issue:1 Volume:53 Page:183-194
ISSN:0362-1340
Container-title:ACM SIGPLAN Notices
language:en
Short-container-title:SIGPLAN Not.

Author:

Zheng Da¹,Mhembere Disa¹,Vogelstein Joshua T.¹,Priebe Carey E.¹,Burns Randal¹

Affiliation:

1. Johns Hopkins University

Abstract

R is one of the most popular programming languages for statistics and machine learning, but it is slow and unable to scale to large datasets. The general approach for having an efficient algorithm in R is to implement it in C or FORTRAN and provide an R wrapper. FlashR accelerates and scales existing R code by parallelizing a large number of matrix functions in the R base package and scaling them beyond memory capacity with solid-state drives (SSDs). FlashR performs memory hierarchy aware execution to speed up parallelized R code by (i) evaluating matrix operations lazily, (ii) performing all operations in a DAG in a single execution and with only one pass over data to increase the ratio of computation to I/O, (iii) performing two levels of matrix partitioning and reordering computation on matrix partitions to reduce data movement in the memory hierarchy. We evaluate FlashR on various machine learning and statistics algorithms on inputs of up to four billion data points. Despite the huge performance gap between SSDs and RAM, FlashR on SSDs closely tracks the performance of FlashR in memory for many algorithms. The R implementations in FlashR outperforms H 2 O and Spark MLlib by a factor of 3 -- 20.

Funder

NSF

Publisher

Association for Computing Machinery (ACM)

Subject

Computer Graphics and Computer-Aided Design,Software

Link

https://dl.acm.org/doi/pdf/10.1145/3200691.3178501

Reference39 articles.

1. SystemML

2. Hybrid parallelization strategies for large-scale machine learning in SystemML

3. A domain-specific approach to heterogeneous parallelism

4. Automatic Parallelization of Array-oriented Programs for a Multi-core Machine

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Eliminating accidental deviations to minimize generalization error and maximize replicability: Applications in connectomics and genomics;PLOS Computational Biology;2021-09-16