The Long Way to Deforestation: A Type Inference and Elaboration Technique for Removing Intermediate Data Structures

Author:

Chen Yijia1ORCID,Parreaux Lionel1ORCID

Affiliation:

1. The Hong Kong University of Science and Technology, Hong Kong, China

Abstract

Deforestation is a compiler optimization that removes intermediate data structure allocations from functional programs to improve their efficiency. This is an old idea, but previous approaches have proved limited or impractical — they either only worked on compositions of predefined combinators (shortcut fusion), or involved the aggressive unfolding of recursive definitions until a depth limit was reached or a reoccurring pattern was found to tie the recursive knot, resulting in impractical algorithmic complexity and large amounts of code duplication. We present Lumberhack, a general-purpose deforestation approach for purely functional call-by-value programs. Lumberhack uses subtype inference to reason about data structure production and consumption and uses an elaboration pass to fuse the corresponding recursive definitions. It fuses large classes of mutually recursive definitions while avoiding much of the unproductive (and sometimes counter-productive) code duplication inherent in previous approaches. We prove the soundness of Lumberhack using logical relations and experimentally demonstrate significant speedups in the standard nofib benchmark suite. We manually adapted nofib programs to call-by-value semantics and compiled them using the OCaml compiler. The average speedup over the 38 benchmarked programs is 8.2% while the average code size increases by just about 1.79x. In particular, 19 programs see their performance mostly unchanged, 17 programs improve significantly (by an average speedup of 16.6%), and only three programs visibly worsen (by an average slowdown of 1.8%). As a point of comparison, we measured that the well-proven but semi-manual list fusion technique of the Glasgow Haskell Compiler (GHC), which only works for call-by-need programs, had an average speedup of 6.5%. Our technique is in its infancy still and misses many deforestation opportunities. We are confident that further refinements to the core technique will yield greater performance improvements in the future.

Funder

Hong Kong Research Grant Council

Publisher

Association for Computing Machinery (ACM)

Reference56 articles.

1. Amal Ahmed. 2006. Step-Indexed Syntactic Logical Relations for Recursive and Quantified Types. In Programming Languages and Systems, Peter Sestoft (Ed.). Springer Berlin Heidelberg, Berlin, Heidelberg. 69–83. isbn:978-3-540-33096-7

2. An indexed model of recursive types for foundational proof-carrying code

3. Supercompilation by evaluation

4. Call Arity

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3