Computational storage: an efficient and scalable platform for big data and HPC applications-Reference-Cited by-同舟云学术

Computational storage: an efficient and scalable platform for big data and HPC applications

Published:2019-11-15 Issue:1 Volume:6 Page:
ISSN:2196-1115
Container-title:Journal of Big Data
language:en
Short-container-title:J Big Data

Author:

Torabzadehkashi Mahdi^ORCID,Rezaei Siavash,HeydariGorji Ali,Bobarshad Hosein,Alves Vladimir,Bagherzadeh Nader

Abstract

AbstractIn the era of big data applications, the demand for more sophisticated data centers and high-performance data processing mechanisms is increasing drastically. Data are originally stored in storage systems. To process data, application servers need to fetch them from storage devices, which imposes the cost of moving data to the system. This cost has a direct relation with the distance of processing engines from the data. This is the key motivation for the emergence of distributed processing platforms such as Hadoop, which move process closer to data. Computational storage devices (CSDs) push the “move process to data” paradigm to its ultimate boundaries by deploying embedded processing engines inside storage devices to process data. In this paper, we introduce Catalina, an efficient and flexible computational storage platform, that provides a seamless environment to process data in-place. Catalina is the first CSD equipped with a dedicated application processor running a full-fledged operating system that provides filesystem-level data access for the applications. Thus, a vast spectrum of applications can be ported for running on Catalina CSDs. Due to these unique features, to the best of our knowledge, Catalina CSD is the only in-storage processing platform that can be seamlessly deployed in clusters to run distributed applications such as Hadoop MapReduce and HPC applications in-place without any modifications on the underlying distributed processing framework. For the proof of concept, we build a fully functional Catalina prototype and a CSD-equipped platform using 16 Catalina CSDs to run Intel HiBench Hadoop and HPC benchmarks to investigate the benefits of deploying Catalina CSDs in the distributed processing environments. The experimental results show up to 2.2× improvement in performance and 4.3× reduction in energy consumption, respectively, for running Hadoop MapReduce benchmarks. Additionally, thanks to the Neon SIMD engines, the performance and energy efficiency of DFT algorithms are improved up to 5.4× and 8.9×, respectively.

Funder

National Science Foundation

Publisher

Springer Science and Business Media LLC

Subject

Information Systems and Management,Computer Networks and Communications,Hardware and Architecture,Information Systems

Link

http://link.springer.com/content/pdf/10.1186/s40537-019-0265-5.pdf

Reference46 articles.

1. James J. Data never sleeps 6.0. 2018. https://www.domo.com/blog/data-never-sleeps-6. Accessed 11 Nov 2019.

2. Javani A, Zorgui M, Wang Z. Age of information in multiple sensing. 2019. arXiv:1902.01975.

3. Kitchin R, McArdle G. What makes big data, big data? exploring the ontological characteristics of 26 datasets. Big Data Soc. 2016;3:1–10.

4. Pfister GF. An introduction to the infiniband architecture. In: High performance mass storage and parallel I/O. 2001;42:617–32.

5. Boden NJ, Cohen D, Felderman RE, Kulawik AE, Seitz CL, Seizovic JN, Su W-K. Myrinet: a gigabit-per-second local area network. IEEE Micro. 1995;15(1):29–36.

Cited by 19 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Design and performance analysis of modern computational storage devices: A systematic review;Expert Systems with Applications;2024-09

2. Synthetic data generation using Copula model and driving behavior analysis;Ain Shams Engineering Journal;2024-09

3. Study on tiered storage algorithm based on heat correlation of astronomical data;Frontiers in Astronomy and Space Sciences;2024-03-14

4. A Novel Architecture of CXL Protocol Data Link Layer for Low Latency Memory Access;2023 International Conference on Microelectronics (ICM);2023-12-17

5. Computational Storage for 3D NAND Flash Error Recovery Flow Prediction;Lecture Notes in Electrical Engineering;2023-11-29