Author:
Sais Manar,Rafalia Najat,Mahdaoui Rabie,Abouchabaka Jaafar
Abstract
Understanding data and extracting information from it are the main objectives of data science, especially when it comes to big data. To achieve these goals, it is necessary to collect and process massive data sets, arriving at the system in different formats at great velocity. The Big Data era has brought us new challenges in data storage and management, and existing state-ofthe-art data storage and processing tools are poised to meet the challenges while posing challenges to the next generation of data. Big Data storage optimization is essential for improving the overall efficiency of Big Data systems by maximizing the use of storage resources. It also reduces the energy consumption of Big Data systems, resulting in financial savings, environmental protection, and improved system performance. Hadoop provides a solution for storing and analysing large quantities of data. However, Hadoop can encounter storage management problems due to its distributed nature and the management of large volumes of data. In order to meet future challenges, the system needs to intelligently manage its storage system. The use of a multi-agent system presents a promising approach for efficiently managing hot and cold data in HDFS. These systems offer a flexible, distributed solution for solving complex problems. This work proposes an approach based on a multi-agent system capable of gathering information on data access activity in the HDFS cluster. Using this information, it classifies data according to its temperature (hot or cold) and makes decisions about data replication based on its classification. In addition, it compresses unused data to manage resources efficiently and reduce storage space usage.
Reference18 articles.
1. Understandable Big Data: A survey
2. Cloud-based manufacturing process monitoring for smart diagnosis services
3. Ibrahim S., Phan T., Carpen-Amarie A., Chihoub H.-E., Moise D., and Antoniu G., “Governing Energy Consumption in Hadoop through CPU Frequency Scaling: an Analysis,” Future Generation Computer Systems, vol. 54, Feb. (2015), doi: 10.1016/j.future.2015.01.005.
4. Nascimento D. M., Ferreira M., and Pardal M. L., “Does Big Data Require Complex Systems? A Performance Comparison Between Spark and Unicage Shell Scripts.” arXiv, Dec. 27, (2022). Accessed: Jun. 10, 2023. [Online]. Available: http://arxiv.org/abs/2212.13647
5. Chandrasekar S., Dakshinamurthy R., Seshakumar P. G., Balasundaram P., and Babu C., A novel indexing scheme for efficient handling of small files in Hadoop Distributed File System. (2013), p. 8. doi: 10.1109/ICCCI.2013.6466147.