Abstract
Reducing the storage space of the relational database management system (DBMS) like Microsoft SQL (MS SQL), MySQL is very challenging nowadays. Using DBMS is vital to any local area network (LAN)-based application including Web-based application and mobile application to store and manage data. These data leave traces on servers and are retained on storage devices for a very long time. Due to heavy program usage and potential user error, the data eventually need to be examined for abnormalities and integrity. The chapter discusses the duplication detection algorithm approaches. In addition, the Levenshtein Algorithm was used and implemented to detect duplicate record in the database alongside the method used for matching records’ multiple fields and the Rule-Based Technique approaches. This topic will contribute knowledge on data cleansing and reducing storage space used by applications and help to maximize storage space of data center.
Reference30 articles.
1. de Carvalho MG, Laender AHF, Goncalves MA, da Silva AS. A genetic programming approach to record deduplication. IEEE Transactions on Knowledge and Data Engineering. 2012;(3):399-412. DOI: 10.1109/TKDE.2010.234
2. Karunakaran D, Rangaswamy R. A Method for Duplicate Record Detection by Exploration and Exploitation of Optimization Algorithm. 2013. [Online]. Available from:
3. Harnik D, Pinkas B, Shulman-Peleg A. Side Channels in Cloud Services, the Case of Deduplication in Cloud Storage. 2010. [Online]. Available from:
4. Tan Y, Jiang H, Feng D, Tian L, Yan Z, Zhou G. SAM: A semantic-aware multi-tiered source de-duplication framework for cloud backup. In: Proceedings of the International Conference on Parallel Processing. 2010. pp. 614-623. DOI: 10.1109/ICPP.2010.69
5. Bhagwat D, Eshghi K, Long DDE, Lillibridge M. Extreme Binning: Scalable, Parallel Deduplication for Chunk-Based File Backup. 2009