Abstract
Record linkage is an important tool to enhance database integration. This is even more valuable in a scenario with more hefty budget cuts and a growing drop in response rate in traditional surveys. This strategy makes it possible to expand the crossing alternatives with variables not present in the original base. However, there are many different data pairing methods exposed in the literature. In this sense, the objective of this paper is to compare well-known methods of record linkage. The comparison was made in synthetic dataset. To compare the methods, it was adopted a quantitative approach based on the Precision, Recall, and F-Statistics metrics, using two comparison functions: Levenshtein and Jaro-Winkler. Among the six types of classifiers analyzed, the supervised methods had the best results.
Publisher
South Florida Publishing LLC
Subject
Materials Science (miscellaneous)