Chinese Person Name Disambiguation Based on Two-Stage Clustering-Reference-Cited by-同舟云学术

Chinese Person Name Disambiguation Based on Two-Stage Clustering

Published:2016-09-20 Issue:5 Volume:20 Page:755-764
ISSN:1883-8014
Container-title:Journal of Advanced Computational Intelligence and Intelligent Informatics
language:en
Short-container-title:JACIII

Author:

Zhou Jie, ,Li Bicheng,Tang Yongwang,

Abstract

Person name clustering disambiguation is the process that partitions name mentions according to corresponding target person entities in reality. The existed methods can not realize effective identification of important features to disambiguate person names. This paper presents a method of Chinese person name disambiguation based on two-stage clustering. This method adopts a stage-by-stage processing model to identify and utilize different types of important features. Firstly, we extract three kinds of core evidences namely direct social relation, indirect social relation and common description prefix, recognize document-pairs referring to the same person entity, and realize initial clustering of person names with high precision. Then, we take the result of initial clustering as new initial input, utilize the statistical properties of multi-documents to recognize and evaluate important features, and build a double-vector representation of clusters (cluster feature vector and important feature vector). Based on the processes above, the final clustering of person names is generated, and the recall of clustering is improved effectively. The experiments have been conducted on the dataset of CLP2010 Chinese person names disambiguation, and experimental results show that this method has good performance in person name clustering disambiguation.

Publisher

Fuji Technology Press Ltd.

Subject

Artificial Intelligence,Computer Vision and Pattern Recognition,Human-Computer Interaction

Reference21 articles.

1. Most Common Male First Names in the U.S., http://names. mongabay.com/male_names.htm [Accessed Nov. 1, 2015].

2. J. Artiles, J. Gonzalo, and S. Sekine, “The SemEval-2007 WePS Evaluation: Establishing a benchmark for the Web People Search Task,” Proc. of the 4th Int. Workshop on Semantic Evaluations (SemEval-2007), pp. 64-69, 2007.

3. J. Artiles, J. Gonzalo, and S. Sekine, “WePS 2 Evaluation Campaign: Overview of the Web People Search Clustering Task,” Proc. of WWW Web People Search Evaluation Workshop, 2009.

4. J. Artiles, A. Borthwick, J. Gonzalo, et al., “WePS-3 Evaluation Campaign: Overview of the Web People Search Clustering and Attribute Extraction Tasks,” Proc. of 3rd Web People Search Evaluation Forum (WePS-3), CLEF, 2010.

5. Y. Chen, P. Jin, W. Li, and C. R. Huang, “The Chinese Persons Name Disambiguation Evaluation: Exploration of Personal Name Disambiguation in Chinese News,” Proc. of CIPS-SIGHAN Joint Conf. on Chinese Language Processing, pp. 20-26, 2010.