Affiliation:
1. Iraqi Commission for Computers and Informatics, Informatics Institute of Postgraduate Studies, Baghdad-Iraq
2. University of Information Technology and Communication (UoITC), Baghdad-Iraq
Abstract
The massive explosion in the field of data such as images, video, audio, and text has caused significant problems in data storage and retrieval. Companies and organizations spend a lot of money to store and manage data. Therefore, there is an urgent need for efficient technologies to deal with this massive amount of data. One of the essential techniques to eliminate redundant data is data deduplication and data reduction. The best technique used for this purpose is data deduplication. Data deduplication decreases bandwidth, hard disc drive utilization, and backup costs by removing redundant data. This paper focuses on studying the literature of several research papers related to data deduplication for various techniques that several researchers have proposed. It summarized multiple concepts and techniques related to deduplication and methods used to improve storage. The data deduplication processes were examined in detail, including data chunking, hashing, indexing, and writing. Also, this study discussed the most critical problems faced by the data deduplication algorithm.