Author:
On Byung-Won,Sang Choi Gyu,Jung Soo-Mok
Abstract
Purpose
– The purpose of this paper is to collect and understand the nature of real cases of author name variants that have often appeared in bibliographic digital libraries (DLs) as a case study of the name authority control problem in DLs.
Design/methodology/approach
– To find a sample of name variants across DLs (e.g. DBLP and ACM) and in a single DL (e.g. ACM), the approach is based on two bipartite matching algorithms: Maximum Weighted Bipartite Matching and Maximum Cardinality Bipartite Matching.
Findings
– First, the authors validated the effectiveness and efficiency of the bipartite matching algorithms. The authors also studied the nature of real cases of author name variants that had been found across DLs (e.g. ACM, CiteSeer and DBLP) and in a single DL.
Originality/value
– To the best of the authors knowledge, there is less research effort to understand the nature of author name variants shown in DLs. A thorough analysis can help focus research effort on real problems that arise when the authors perform duplicate detection methods.
Subject
Library and Information Sciences,Information Systems
Reference68 articles.
1. Ananthakrishna, R.
,
Chaudhuri, S.
and
Ganti, V.
(2002), “Eliminating fuzzy duplicates in data warehouses”, Proceedings of 28th International Conference on Very Large Data Bases (VLDB′02), Hong Kong, August 20-23.
2. Bekkerman, R.
and
McCallum, A.
(2005), “Disambiguating web appearances of people in a social network”, Proceedings of 14th International World Wide Web Conference (WWW′05), Chiba, May 10-14.
3. Benjelloun, O.
,
Garcia-Molina, H.
,
Su, Q.
and
Widom, J.
(2005), “Swoosh: a generic approach to entity resolution”, technical report, Department of Computer Science, Stanford University, Stanford, CA.
4. Bennett, C.
,
Gacs, M.
,
Li, M.
,
Vitanyi, P.
and
Zurek, W.
(2002), “Information distance”, IEEE Transactions on Information Theory, Vol. 44 No. 4, pp. 1407-1423.
5. Bhattacharya, I.
and
Getoor, L.
(2004), “Iterative record linkage for cleaning and integration”, Proceedings of ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery (DMKD′04), Paris, June 13.
Cited by
3 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献