Scientific Workflows in IoT Environments: A Data Placement Strategy Based on Heterogeneous Edge-Cloud Computing-Reference-Cited by-同舟云学术

Scientific Workflows in IoT Environments: A Data Placement Strategy Based on Heterogeneous Edge-Cloud Computing

Published:2022-08-10 Issue:4 Volume:13 Page:1-26
ISSN:2158-656X
Container-title:ACM Transactions on Management Information Systems
language:en
Short-container-title:ACM Trans. Manage. Inf. Syst.

Author:

Du Xin¹^ORCID,Tang Songtao¹^ORCID,Lu Zhihui²^ORCID,Gai Keke³^ORCID,Wu Jie⁴^ORCID,Hung Patrick C. K.⁵^ORCID

Affiliation:

1. School of Computer Science, Fudan University, Shanghai, China and Engineering Research Center of Cyber Security Auditing and Monitoring, Ministry of Education, Shanghai, China

2. School of Computer Science, Fudan University, Shanghai, China and Shanghai Blockchain Engineering Research Center, Shanghai, China

3. School of Cyberspace Security, Beijing Institute of Technology, Beijing, China

4. School of Computer Science, Fudan University, Shanghai, China and Peng Cheng Laboratory, Shenzhen, China

5. Faulty of Business and Information Technology, Ontario Tech University, Oshawa, Ontario, Canada

Abstract

In Industry 4.0 and Internet of Things (IoT) environments, the heterogeneous edge-cloud computing paradigm can provide a more proper solution to deploy scientific workflows compared to cloud computing or other traditional distributed computing. Owing to the different sizes of scientific datasets and the privacy issue concerning some of these datasets, it is essential to find a data placement strategy that can minimize data transmission time. Some state-of-the-art data placement strategies combine edge computing and cloud computing to distribute scientific datasets. However, the dynamic distribution of newly generated datasets to appropriate datacenters and exiting the spent datasets are still a challenge during workflows execution. To address this challenge, this study not only constructs a data placement model that includes shared datasets within the individual and among multiple workflows across various geographical regions, but also proposes a data placement strategy (DYM-RL-DPS) based on algorithms of two stages. First, during the build-time stage of workflows, we use the discrete particle swarm optimization algorithm with differential evolution to pre-allocate initial datasets to proper datacenters. Then, we reformulate the dynamic datasets distribution problem as a Markov decision process and provide a reinforcement learning–based approach to learn the data placement strategy in the runtime stage of scientific workflows. Through using the heterogeneous edge-cloud computing architecture to simulate IoT environments, we designed comprehensive experiments to demonstrate the superiority of DYM-RL-DPS. The results of our strategy can effectively reduce the data transmission time as compared to other strategies.

Funder

National Key Research and Development Program of China

National Natural Science Foundation of China

Shanghai Science and Technology Innovation Action Plan Project

Publisher

Association for Computing Machinery (ACM)

Subject

General Computer Science,Management Information Systems

Link

https://dl.acm.org/doi/pdf/10.1145/3531327

Reference33 articles.

1. Typetheoretic Approach to the Shimming Problem in Scientific Workflows