Affiliation:
1. School of Computing Science and Engineering, VIT University, Vellore, India
Abstract
Performance improvement in modern processor is staggering due to power wall and memory wall problem. In general, the power wall problem is addressed by various vectorization design techniques. The Memory wall problem is diminished through prefetching technique. In this paper vectorization is achieved through Single Instruction Multiple Data (SIMD) registers of the current processor. It provides architecture optimization by reducing the number of instructions in the pipeline and by minimizing the utilization of multi-level memory hierarchy. These registers provide an economical computing platform compared to Graphics Processing Unit (GPU) for compute intensive applications. This paper explores software prefetching via Streaming SIMD extension (SSE) instructions to mitigate the memory wall problem. This work quantifies the effect of vectorization and prefetching in Matrix Vector Multiplication (MVM) kernel with dense and sparse structure. Both Prefetching and Vectorization method reduces the data and instruction cache pressure and thereby improving the cache performance. To show the cache performance improvements in the kernel, the Intel VTune amplifier is used. Finally, experimental results demonstrate a promising performance of matrix kernel by Intel Haswell's processor. However, effective utilization of SIMD registers is a programming challenge to the developers.
Subject
Computer Networks and Communications
Cited by
4 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. VAIL: A Victim-Aware Cache Policy to improve NVM Lifetime for hybrid memory system;Parallel Computing;2019-09
2. VAIL;Proceedings of the 9th International Workshop on Programming Models and Applications for Multicores and Manycores;2018-02-24
3. WatCache: a workload-aware temporary cache on the compute side of HPC systems;The Journal of Supercomputing;2017-10-26
4. Prefetching-based metadata management in Advanced Multitenant Hadoop;The Journal of Supercomputing;2017-03-27