SIMD defragmenter-Reference-Cited by-同舟云学术

SIMD defragmenter

Published:2012-04-18 Issue:1 Volume:40 Page:363-374
ISSN:0163-5964
Container-title:ACM SIGARCH Computer Architecture News
language:en
Short-container-title:SIGARCH Comput. Archit. News

Author:

Park Yongjun¹,Seo Sangwon¹,Park Hyunchul¹,Cho Hyoun Kyu¹,Mahlke Scott¹

Affiliation:

1. University of Michigan, Ann Arbor, MI, USA

Abstract

Single-instruction multiple-data (SIMD) accelerators provide an energy-efficient platform to scale the performance of mobile systems while still retaining post-programmability. The central challenge is translating the parallel resources of the SIMD hardware into real application performance. In scientific applications, automatic vectorization techniques have proven quite effective at extracting large levels of data-level parallelism (DLP). However, vectorization is often much less effective for media applications due to low trip count loops, complex control flow, and non-uniform execution behavior. As a result, SIMD lanes remain idle due to insufficient DLP. To attack this problem, this paper proposes a new vectorization pass called SIMD Defragmenter to uncover hidden DLP that lurks below the surface in the form of instruction-level parallelism (ILP). The difficulty is managing the data packing/unpacking overhead that can easily exceed the benefits gained through SIMD execution. The SIMD degragmenter overcomes this problem by identifying groups of compatible instructions (subgraphs) that can be executed in parallel across the SIMD lanes. By SIMDizing in bulk at the subgraph level, packing/unpacking overhead is minimized. On a 16-lane SIMD processor, experimental results show that SIMD defragmentation achieves a mean 1.6x speedup over traditional loop vectorization and a 31% gain over prior research approaches for converting ILP to DLP.

Publisher

Association for Computing Machinery (ACM)

Link

https://dl.acm.org/doi/pdf/10.1145/2189750.2151014

Reference34 articles.

1. Efficient Selection of Vector Instructions Using Dynamic Programming

2. Vector Processing as an Enabler for Software-Defined Radio in Handheld Devices

3. A programmable platform for software-defined radio

Cited by 6 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. GATE;Proceedings of the 56th Annual Design Automation Conference 2019;2019-06-02

2. Push versus pull-based loop fusion in query engines;Journal of Functional Programming;2018

3. FlexVec: auto-vectorization for irregular loops;ACM SIGPLAN Notices;2016-08

4. FlexVec: auto-vectorization for irregular loops;Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation;2016-06-02

5. Automatic Vectorization of Interleaved Data Revisited;ACM Transactions on Architecture and Code Optimization;2016-01-07