A portable C++ library for memory and compute abstraction on multi‐core CPUs and GPUs-Reference-Cited by-同舟云学术

A portable C++ library for memory and compute abstraction on multi‐core CPUs and GPUs

Published:2023-07-24 Issue:25 Volume:35 Page:
ISSN:1532-0626
Container-title:Concurrency and Computation: Practice and Experience
language:en
Short-container-title:Concurrency and Computation

Author:

Incardona Pietro¹²³⁴,Gupta Aryaman¹²³,Yaskovets Serhii¹²³,Sbalzarini Ivo F.¹²³

Affiliation:

1. Faculty of Computer Science Technische Universität Dresden Dresden Germany

2. Max Planck Institute of Molecular Cell Biology and Genetics Dresden Germany

3. Center for Systems Biology Dresden Dresden Germany

4. Center for Scalable Data Analytics and Artificial Intelligence ScaDS.AI Leipzig Germany

Abstract

AbstractWe present a C++ library for transparent memory and compute abstraction across CPU and GPU architectures. Our library combines generic data structures like vectors, multi‐dimensional arrays, maps, graphs, and sparse grids with basic generic algorithms like arbitrary‐dimensional convolutions, copying, merging, sorting, prefix sum, reductions, neighbor search, and filtering. The memory layout of the data structures is adapted at compile time using C++ tuples with optional memory double‐mapping between host and device and the capability of using memory managed by external libraries with no data copying. We combine this transparent memory layout with generic thread‐parallel algorithms under two alternative common interfaces: a CUDA‐like kernel interface and a lambda‐function interface. We quantify the memory and compute performance and portability of our implementation using micro‐benchmarks, showing that the abstractions introduce negligible performance overhead, and we compare performance against the current state of the art in a real‐world scientific application from computational fluid mechanics.

Funder

Bundesministerium für Bildung und Forschung

Publisher

Wiley

Subject

Computational Theory and Mathematics,Computer Networks and Communications,Computer Science Applications,Theoretical Computer Science,Software

Link

https://onlinelibrary.wiley.com/doi/pdf/10.1002/cpe.7870

Reference18 articles.

1. Abstractions and Middleware for Petascale Computing and Beyond

2. Kokkos 3: Programming Model Extensions for the Exascale Era

3. Enabling manycore performance portability through polymorphic memory access patterns;Edwards HC;J Parallel Distrib Comput,2014

4. ZenkerE WorpitzB WideraR et al.Alpaka–an abstraction library for parallel kernel acceleration.IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).2016:631‐640.

5. BeckingsaleD BurmarkJ HornungR et al.RAJA: portable performance for large‐scale scientific applications.IEEE/ACM International Workshop on Performance Portability and Productivity In HPC (P3HPC).2019:71‐81.

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. A high-order fully Lagrangian particle level-set method for dynamic surfaces;Journal of Computational Physics;2024-10

2. Morphogen gradients are regulated by porous media characteristics of the developing tissue;2024-04-06