Abstract
Purpose
Matching instances of the same entity, a task known as entity resolution, is a key step in the process of data integration. This paper aims to propose a deep learning network that learns different representations of Web entities for entity resolution.
Design/methodology/approach
To match Web entities, the proposed network learns the following representations of entities: embeddings, which are vector representations of the words in the entities in a low-dimensional space; convolutional vectors from a convolutional layer, which capture short-distance patterns in word sequences in the entities; and bag-of-word vectors, created by a bow layer that learns weights for words in the vocabulary based on the task at hand. Given a pair of entities, the similarity between their learned representations is used as a feature to a binary classifier that identifies a possible match. In addition to those features, the classifier also uses a modification of inverse document frequency for pairs, which identifies discriminative words in pairs of entities.
Findings
The proposed approach was evaluated in two commercial and two academic entity resolution benchmarking data sets. The results have shown that the proposed strategy outperforms previous approaches in the commercial data sets, which are more challenging, and have similar results to its competitors in the academic data sets.
Originality/value
No previous work has used a single deep learning framework to learn different representations of Web entities for entity resolution.
Subject
Computer Networks and Communications,Information Systems
Reference34 articles.
1. Swoosh: a generic approach to entity resolution;The VLDB JournalThe International Journal on Very Large Data Bases,2009
2. A latent dirichlet model for unsupervised entity resolution,2006
3. Adaptive duplicate detection using learnable string similarity measures,2003
4. Signature verification using a ‘siamese’ time delay neural network;International Journal of Pattern Recognition and Artificial Intelligence,1993
Cited by
6 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献