BigData Analysis in Healthcare: Apache Hadoop , Apache spark and Apache Flink-Reference-Cited by-同舟云学术

BigData Analysis in Healthcare: Apache Hadoop , Apache spark and Apache Flink

Published:2019-07-27 Issue:1 Volume:8 Page:14
ISSN:2676-7104
Container-title:Frontiers in Health Informatics
language:
Short-container-title:Front Health Inform

Author:

Nazari Elham^ORCID,Shahriari Mohammad Hasan^ORCID,Tabesh Hamed^ORCID

Abstract

Introduction: Health care data is increasing. The correct analysis of such data will improve the quality of care and reduce costs. This kind of data has certain features such as high volume, variety, high-speed production, etc. It makes it impossible to analyze with ordinary hardware and software platforms. Choosing the right platform for managing this kind of data is very important. The purpose of this study is to introduce and compare the most popular and most widely used platform for processing big data, Apache Hadoop MapReduce, and the two Apache Spark and Apache Flink platforms, which have recently been featured with great prominence.Material and Methods: This study is a survey whose content is based on the subject matter search of the Proquest, PubMed, Google Scholar, Science Direct, Scopus, IranMedex, Irandoc, Magiran, ParsMedline and Scientific Information Database (SID) databases, as well as Web reviews, specialized books with related keywords and standard. Finally, 80 articles related to the subject of the study were reviewed.Results: The findings showed that each of the studied platforms has features, such as data processing, support for different languages, processing speed, computational model, memory management, optimization, delay, error tolerance, scalability, performance, compatibility, Security and so on. Overall, the findings showed that the Apache Hadoop environment has simplicity, error detection, and scalability management based on clusters, but because its processing is based on batch processing, it works for slow complex analyzes and does not support flow processing, Apache Spark is also distributed as a computational platform that can process a big data set in memory with a very fast response time, the Apache Flink allows users to store data in memory and load them multiple times and provide a complex Fault Tolerance mechanism Continuously retrieves data flow status.Conclusion: The application of big data analysis and processing platforms varies according to the needs. In other words, it can be said that each technology is complementary, each of which is applicable in a particular field and cannot be separated from one another and depending on the purpose and the expected expectation, and the platform must be selected for analysis or whether custom tools are designed on these platforms.

Publisher

Farname, Inc.

Cited by 32 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Construction of large scale English teaching corpus and e-learning system based on Apache Hadoop algorithm;Entertainment Computing;2024-05

2. Building Advanced Web Applications Using Data Ingestion and Data Processing Tools;Electronics;2024-02-09

3. A survey of multimodal information fusion for smart healthcare: Mapping the journey from data to wisdom;Information Fusion;2024-02

4. Machine Learning and Deep Learning for Big Data Analysis;Advances in Business Information Systems and Analytics;2024-01-04

5. A conceptual framework for the ICU of the future evaluated by the MIMIC-III digital archive;Global Knowledge, Memory and Communication;2024-01-02