Author:
Dinesh Lina,Devi K. Gayathri
Abstract
AbstractIn big data, analysis data is collected from different sources in various formats, transforming into the aspect of cleansing the data, customization, and loading it into a Data Warehouse. Extracting data in other formats and transforming it to the required format requires transformation algorithms. This transformation stage has redundancy issues and is stored across any location in the data warehouse, which increases computation costs. The main issues in big data ETL are handling high-dimensional data and maintaining similar data for effective data warehouse usage. Therefore, Extract, Transform, Load (ETL) plays a vital role in extracting meaningful information from the data warehouse and trying to retain the users. This paper proposes hybrid optimization of Swarm Intelligence with a tabu search algorithm for handling big data in a cloud-based architecture-based ETL process. This proposed work overcomes many issues related to complex data storage and retrieval in the data warehouse. Swarm Intelligence algorithms can overcome problems like high dimensional data, dynamical change of huge data and cost optimization in the transformation stage. In this work for the swarm intelligence algorithm, a Grey-Wolf Optimizer (GWO) is implemented to reduce the high dimensionality of data. Tabu Search (TS) is used for clustering the relevant data as a group. Clustering means the segregation of relevant data accurately from the data warehouse. The cluster size in the ETL process can be optimized by the proposed work of (GWO-TS). Therefore, the huge data in the warehouse can be processed within an expected latency.
Publisher
Springer Science and Business Media LLC
Subject
Computer Networks and Communications,Software
Reference51 articles.
1. Zdravevski E, Lameski P, Dimitrievski A, Grzegorowski M, Apanowicz C (2019) Cluster-size optimization within a cloud-based ETL framework for Big Data. In: 2019 IEEE International Conference on Big Data (IEEE BigData 2019), at Los Angles, USA, pp 3754–3763
2. Aziz O, Anees T, Mehmood E (2021) An efficient data access approach with queue and stack in optimized hybrid join. IEEE Access 9:41261–41274.
3. Mehra KK et al (2017) Extract, transform and load (ETL) system and method. U.S. patent no. 9
4. Souigbui M, Augui F, Zammali S, Cherfi S, Yahia SB (2019) Data quality in ETL process: a preliminary study. Procedia Comput Sci 159:676–687. Elsevier
5. Zdravevski E, Apanowicz C, Stencel K, Slezak D (2019) Scalable cloud-based ETL for self-serving analytics. In: Perner P (ed) Advances in data mining: applications and theoretical aspects. 19th Industrial Conference, ICDM 2019. Springer International Publishing, Cham, pp 387–394