Reducing Database Storage Space by Eliminating Duplicate Records-Reference-Cited by-同舟云学术

Reducing Database Storage Space by Eliminating Duplicate Records

Published:2024-07-17 Issue: Volume: Page:
ISSN:
Container-title:Advances in Digital Transformation - Rise of Ultra-Smart Fully Automated Cyberspace
language:en
Short-container-title:

Author:

S. Valeriano Eugene

Abstract

Reducing the storage space of the relational database management system (DBMS) like Microsoft SQL (MS SQL), MySQL is very challenging nowadays. Using DBMS is vital to any local area network (LAN)-based application including Web-based application and mobile application to store and manage data. These data leave traces on servers and are retained on storage devices for a very long time. Due to heavy program usage and potential user error, the data eventually need to be examined for abnormalities and integrity. The chapter discusses the duplication detection algorithm approaches. In addition, the Levenshtein Algorithm was used and implemented to detect duplicate record in the database alongside the method used for matching records’ multiple fields and the Rule-Based Technique approaches. This topic will contribute knowledge on data cleansing and reducing storage space used by applications and help to maximize storage space of data center.

Publisher

IntechOpen

Link

https://intech-files.s3.amazonaws.com/a043Y00000yJC5zQAG/a093Y00001h7mIrQAI/Final-Reducing%20Database%20Storage%20Space%20by%20Eliminating%20Dup%20%282024-06-10%2012%3A05%3A14%29.pdf

Reference30 articles.

1. de Carvalho MG, Laender AHF, Goncalves MA, da Silva AS. A genetic programming approach to record deduplication. IEEE Transactions on Knowledge and Data Engineering. 2012;(3):399-412. DOI: 10.1109/TKDE.2010.234

2. Karunakaran D, Rangaswamy R. A Method for Duplicate Record Detection by Exploration and Exploitation of Optimization Algorithm. 2013. [Online]. Available from:

3. Harnik D, Pinkas B, Shulman-Peleg A. Side Channels in Cloud Services, the Case of Deduplication in Cloud Storage. 2010. [Online]. Available from:

4. Tan Y, Jiang H, Feng D, Tian L, Yan Z, Zhou G. SAM: A semantic-aware multi-tiered source de-duplication framework for cloud backup. In: Proceedings of the International Conference on Parallel Processing. 2010. pp. 614-623. DOI: 10.1109/ICPP.2010.69

5. Bhagwat D, Eshghi K, Long DDE, Lillibridge M. Extreme Binning: Scalable, Parallel Deduplication for Chunk-Based File Backup. 2009