Affiliation:
1. UBS, USA
2. INTELLIGENTRABBIT LLC, USA
Abstract
This article provides a comprehensive understanding of the cutting-edge big data workflow technologies that have been widely applied in industrial applications, covering a broad range of the most current big data processing methods and tools, including Hadoop, Hive, MapReduce, Sqoop, Hue, Spark, Cloudera, Airflow, and GitLab. An industrial data workflow pipeline is proposed and investigated in terms of the system architecture, which is designed to meet the needs of data-driven industrial big data analytics applications concentrated on large-scale data processing. It differs from traditional data pipelines and workflows in its ability of ETL and analytical portals. The proposed data workflow can improve the industrial analytics applications for multiple tasks. This article also provides bid data researchers and professionals with an understanding of the challenges facing big data analytics in real-world environments and informs interdisciplinary studies in this field.
Reference27 articles.
1. An efficient HADOOP frameworks SQOOP and ambari for big data processing.;S. S.Aravinth;International Journal for Innovative Research in Science and Technology,2015
2. Open Source Big Data Platforms and Tools: An Analysis
3. Borthakur, D. (2008). HDFS architecture guide. Hadoop Apache Project, 53(1-13), 2.
4. ChambersB.ZahariaM. (2018). Spark: The definitive guide: Big data processing made simple. O'Reilly Media, Inc.
5. Big data and advanced analytics tools