Affiliation:
1. School of Computer, National University of Defense Technology
Abstract
Data transfer from a host central processing unit (CPU) into an accelerator is a performance bottleneck for applications accelerated by accelerators (such as general purpose digital signal processing (GPDSP), many integrated core (MIC), and general purpose graphics processing unit (GPGPU)). It is complicated and inefficient to transfer non-contiguous data with special respect to strided data. In this work, we present three approaches to transfer strided data for different scenarios: Redundant copy (RC), selective copy (SC), and transfer after transformed (TaT). We propose a space and time efficient method named TaT, in which strided data are transformed on the CPU first and then transferred into the accelerator. We simulated regions-of-interest (ROI) coding and validate proposed techniques. TaT was superior to RC on space efficiency and close to SC on saving space, but better than SC on time waste respectively.
Subject
Materials Chemistry,Polymers and Plastics,Process Chemistry and Technology