Affiliation:
1. Shenzhen Institute of Computing Sciences, China and University of Edinburgh, United Kingdom and Beihang University, China
2. Beihang University, China
3. National University of Defense Technology, China
Abstract
This paper proposes a notion of parametric simulation to link entities across a relational database
\(\mathcal {D} \)
and a graph
G
. Taking functions and thresholds for measuring vertex closeness, path associations and important properties as parameters, parametric simulation identifies tuples
t
in
\(\mathcal {D} \)
and vertices
v
in
G
that refer to the same real-world entity, based on both topological and semantic matching. We develop machine learning methods to learn the parameter functions and thresholds. We show that parametric simulation is in quadratic-time, by providing such an algorithm. Moreover, we develop an incremental algorithm for parametric simulation; we show that the incremental algorithm is bounded relative to its batch counterpart,
i.e.,
it incurs the minimum cost for incrementalizing the batch algorithm. Putting these together, we develop HER, a parallel system to check whether (
t
,
v
) makes a match, find all vertex matches of
t
in
G
, and compute all matches across
\(\mathcal {D} \)
and
G
, all in quadratic-time; moreover, HER supports incremental computation of these in response to updates to
\(\mathcal {D} \)
and
G
. Using real-life and synthetic data, we empirically verify that HER is accurate with F-measure of 0.94 on average, and is able to scale with database
\(\mathcal {D} \)
and graph
G
for both batch and incremental computations.
Publisher
Association for Computing Machinery (ACM)
Reference113 articles.
1. Semtab challenge. https://www.cs.ox.ac.uk/isg/challenges/sem-tab/. Semtab challenge. https://www.cs.ox.ac.uk/isg/challenges/sem-tab/.
2. Semtab challenge 2020. https://www.cs.ox.ac.uk/isg/challenges/sem-tab/2020/index.html. Semtab challenge 2020. https://www.cs.ox.ac.uk/isg/challenges/sem-tab/2020/index.html.
3. FB15k-237 2015. https://paperswithcode.com/dataset/fb15k-237. FB15k-237 2015. https://paperswithcode.com/dataset/fb15k-237.
4. DBpedia as tables 2020. https://wiki.dbpedia.org. DBpedia as tables 2020. https://wiki.dbpedia.org.
5. DBpedia version 2016-10 2020. https://wiki.dbpedia.org. DBpedia version 2016-10 2020. https://wiki.dbpedia.org.