A study of concept similarity in Wikidata-Reference-Cited by-同舟云学术

A study of concept similarity in Wikidata

Published:2024-05-14 Issue:3 Volume:15 Page:877-896
ISSN:2210-4968
Container-title:Semantic Web
language:
Short-container-title:SW

Author:

Ilievski Filip¹,Shenoy Kartik¹,Chalupsky Hans¹,Klein Nicholas¹,Szekely Pedro¹

Affiliation:

1. Information Sciences Institute, University of Southern California, CA, USA

Abstract

Robust estimation of concept similarity is crucial for applications of AI in the commercial, biomedical, and publishing domains, among others. While the related task of word similarity has been extensively studied, resulting in a wide range of methods, estimating concept similarity between nodes in Wikidata has not been considered so far. In light of the adoption of Wikidata for increasingly complex tasks that rely on similarity, and its unique size, breadth, and crowdsourcing nature, we propose that conceptual similarity should be revisited for the case of Wikidata. In this paper, we study a wide range of representative similarity methods for Wikidata, organized into three categories, and leverage background information for knowledge injection via retrofitting. We measure the impact of retrofitting with different weighted subsets from Wikidata and ProBase. Experiments on three benchmarks show that the best performance is achieved by pairing language models with rich information, whereas the impact of injecting knowledge is most positive on methods that originally do not consider comprehensive information. The performance of retrofitting is conditioned on the selection of high-quality similarity knowledge. A key limitation of this study, similar to prior work lies in the limited size and scope of the similarity benchmarks. While Wikidata provides an unprecedented possibility for a representative evaluation of concept similarity, effectively doing so remains a key challenge.

Publisher

IOS Press

Reference94 articles.

1. E. Agirre and A. Soroa, Personalizing pagerank for word sense disambiguation, in: Proceedings of the 12th Conference of the European Chapter of the ACL (EACL 2009), 2009, pp. 33–41.

2. A Cluster-Based Approach for Semantic Similarity in the Biomedical Domain

3. M.A. Alkhamees, M.A. Alnuem, S.M. Al-Saleem and A.M. Al-Ssulami, A semantic metric for concepts similarity in knowledge graphs, Journal of Information Science (2021). 01655515211020580.

4. C.F. Baker, C.J. Fillmore and J.B. Lowe, The Berkeley framenet project, in: COLING 1998 Volume 1: The 17th International Conference on Computational Linguistics, 1998.

5. Distributional memory: A general framework for corpus-based semantics;Baroni;Computational Linguistics,2010

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Do Similar Entities Have Similar Embeddings?;Lecture Notes in Computer Science;2024

2. Bringing Back Semantics to Knowledge Graph Embeddings: An Interpretability Approach;Lecture Notes in Computer Science;2024