Task-based programming in COMPSs to converge from HPC to big data-Reference-Cited by-同舟云学术

Task-based programming in COMPSs to converge from HPC to big data

Published:2017-04-06 Issue:1 Volume:32 Page:45-60
ISSN:1094-3420
Container-title:The International Journal of High Performance Computing Applications
language:en
Short-container-title:The International Journal of High Performance Computing Applications

Author:

Conejero Javier¹,Corella Sandra¹,Badia Rosa M¹²,Labarta Jesus¹

Affiliation:

1. Barcelona Supercomputing Center (BSC), Barcelona, Spain

2. Institut d’Investigació en Intel-ligència Artificial – Consejo Superior de Investigaciones Cientificas (IIIA – CSIC), Barcelona, Spain

Abstract

Task-based programming has proven to be a suitable model for high-performance computing (HPC) applications. Different implementations have been good demonstrators of this fact and have promoted the acceptance of task-based programming in the OpenMP standard. Furthermore, in recent years, Apache Spark has gained wide popularity in business and research environments as a programming model for addressing emerging big data problems. COMP Superscalar (COMPSs) is a task-based environment that tackles distributed computing (including Clouds) and is a good alternative for a task-based programming model for big data applications. This article describes why we consider that task-based programming models are a good approach for big data applications. The article includes a comparison of Spark and COMPSs in terms of architecture, programming model, and performance. It focuses on the differences that both frameworks have in structural terms, on their programmability interface, and in terms of their efficiency by means of three widely known benchmarking kernels: Wordcount, Kmeans, and Terasort. These kernels enable the evaluation of the more important functionalities of both programming models and analyze different work flows and conditions. The main results achieved from this comparison are (1) COMPSs is able to extract the inherent parallelism from the user code with minimal coding effort as opposed to Spark, which requires the existing algorithms to be adapted and rewritten by explicitly using their predefined functions, (2) it is an improvement in terms of performance when compared with Spark, and (3) COMPSs has shown to scale better than Spark in most cases. Finally, we discuss the advantages and disadvantages of both frameworks, highlighting the differences that make them unique, thereby helping to choose the right framework for each particular objective.

Publisher

SAGE Publications

Subject

Hardware and Architecture,Theoretical Computer Science,Software

Link

http://journals.sagepub.com/doi/pdf/10.1177/1094342017701278

Reference18 articles.

1. COMP Superscalar, an interoperable programming framework

2. Java thread and process performance for parallel machine learning on multicore HPC clusters

3. Comparing Apache Spark and Map Reduce with Performance Analysis using K-Means

Cited by 23 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Enhancing iteration performance on distributed task-based workflows;Future Generation Computer Systems;2023-12

2. Role-shifting threads: Increasing OpenMP malleability to address load imbalance at MPI and OpenMP;The International Journal of High Performance Computing Applications;2023-10-21

3. Automatically balancing relocatable distributed collections;Concurrency and Computation: Practice and Experience;2023-04-23

4. PuzzleMesh: A Puzzle Model to Build Mesh of Agnostic Services for Edge-Fog-Cloud;IEEE Transactions on Services Computing;2023-03-01

5. An Earlier Experiences Towards Optimizing Apache Spark Over Frontera Supercomputer;Lecture Notes in Computer Science;2023