Vectorization of apply to reduce interpretation overhead of R

Author:

Wang Haichuan1,Padua David1,Wu Peng2

Affiliation:

1. University of Illinois at Urbana-Champaign, USA

2. Huawei Lab, USA

Abstract

R is a popular dynamic language designed for statistical computing. Despite R's huge user base, the inefficiency in R's language implementation becomes a major pain-point in everyday use as well as an obstacle to apply R to solve large scale analytics problems. The two most common approaches to improve the performance of dynamic languages are: implementing more efficient interpretation strategies and extending the interpreter with Just-In-Time (JIT) compiler. However, both approaches require significant changes to the interpreter, and complicate the adoption by development teams as a result. This paper presents a new approach to improve execution efficiency of R programs by vectorizing the widely used Apply class of operations. Apply accepts two parameters: a function and a collection of input data elements. The standard implementation of Apply iteratively invokes the input function with each element in the data collection. Our approach combines data transformation and function vectorization to convert the looping-over-data execution of the standard Apply into a single invocation of a vectorized function that contains a sequence of vector operations over the input data. This conversion can significantly speed-up the execution of Apply operations in R by reducing the number of interpretation steps. We implemented the vectorization transformation as an R package. To enable the optimization, all that is needed is to invoke the package, and the user can use a normal R interpreter without any changes. The evaluation shows that the proposed method delivers significant performance improvements for a collection of data analysis algorithm benchmarks. This is achieved without any native code generation and using only a single-thread of execution.

Publisher

Association for Computing Machinery (ACM)

Subject

Computer Graphics and Computer-Aided Design,Software

Reference48 articles.

1. The computer language benchmarks game (CLBG) 2013. http://benchmarksgame.alioth.debian.org/. The computer language benchmarks game (CLBG) 2013. http://benchmarksgame.alioth.debian.org/.

2. Numpy 2013. http://www.numpy.org/. Numpy 2013. http://www.numpy.org/.

3. The hiphop virtual machine

4. Conversion of control dependence to data dependence

5. MaJIC

Cited by 5 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Research on vectorized engineering file management model;Applied Mathematics and Nonlinear Sciences;2023-06-30

2. Quantifying the interpretation overhead of Python;Science of Computer Programming;2022-03

3. Run-time data analysis to drive compiler optimizations;Companion Proceedings of the 2021 ACM SIGPLAN International Conference on Systems, Programming, Languages, and Applications: Software for Humanity;2021-10-17

4. Improving database query performance with automatic fusion;Proceedings of the 29th International Conference on Compiler Construction;2020-02-22

5. An Efficient Gaussian Kernel Based Fuzzy-Rough Set Approach for Feature Selection;Lecture Notes in Computer Science;2016

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3