Loop unrolling optimization for dual SIMD extension-Reference-Cited by-同舟云学术

Loop unrolling optimization for dual SIMD extension

Published:2024-05-24 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Yao Jinyang¹,Liu Lili¹,Fu Xuanyu¹,Liu Wenbo¹,Wu Wei²,Shan Zheng¹

Affiliation:

1. State Key Laboratory of Mathematical Engineering and Advanced Computing

2. National Research Center of Parallel Computer Engineering and Technology

Abstract

SIMD extensions are playing an increasingly important role in high-performance computing and artificial intelligence fields. To fully utilize these components, various manufacturers and institutions have implemented many optimizations for SIMD extensions, with dual SIMD extension pipeline optimization being one of them. This method generates instructions suitable for parallel execution of the integrated dual SIMD extension on processors by unrolling vectorizable loops in programs. It is integrated into a mainstream compiler GCC as an optimization pass and can be enabled with just one compilation option. Experiments were conducted on an SW421 processor, testing standard benchmark suites such as SPEC CPU 2006 and NPB. The experiments showed that after using this optimization pass, programs generated by the compiler can fully utilize the dual SIMD extension during execution. Compared with turning on the autovectorization option, the method has an acceleration effect on multiple applications in the test set, and the execution efficiency is improved by an average of 5.6% and a maximum of 14%.

Publisher

Research Square Platform LLC

Reference21 articles.

1. Gao W and Zhao RC and Han L and Pang JM and Ding R (2015) Research on SIMD auto-vectorization compiling optimization. Journal of Software 26(6): 1265--1284 https://doi.org/10.13328/j.cnki.jos.004811

2. Zheng, Ruohuang and Pai, Sreepathi (2021) Efficient execution of graph algorithms on CPU with SIMD extensions. 10.1109/CGO51591.2021.9370326, IEEE, 262--276, 2021 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)

3. Bian, Haodong and Huang, Jianqiang and Liu, Lingbin and Huang, Dongqiang and Wang, Xiaoying (2021) ALBUS: A method for efficiently processing SpMV using SIMD and Load balancing. Future Generation Computer Systems 116: 371--392 https://doi.org/10.1016/j.future.2020.10.036, Elsevier

4. Yamazaki, Susumu (2021) Future possibilities and effectiveness of JIT from elixir code of image processing and machine learning into native code with SIMD instructions. http://id.nii.ac.jp/1001/00218031/, IPSJ Special Interest Group on Programming 136th Meeting

5. Patsidis, Kariofyllis and Nicopoulos, Chrysostomos and Sirakoulis, Georgios Ch and Dimitrakopoulos, Giorgos (2020) RISC-V 2: a scalable RISC-V vector processor. 10.1109/ISCAS45731.2020.9181071, IEEE, 1--5, 2020 IEEE International Symposium on Circuits and Systems (ISCAS)