A Semantic Similarity Measure for Scholarly Document Based on the Study of n-gram
-
Published:2022-12-28
Issue:
Volume:
Page:
-
ISSN:1544-5976
-
Container-title:Journal of Web Engineering
-
language:
-
Short-container-title:JWE
Author:
Samen Yannick-Ulrich Tchantchou
Abstract
The performance of information retrieval systems is closely related to the ability of similarity measures to accurately determine the similarity value between documents or between a query and a document. In this paper, the issue of similarity measures in the context of scholarly documents is addressed. A semantic similarity measure is proposed. This similarity measure is able to exploit the metadata contained in the scientific articles, as well as the important n-grams identified in them. To evaluate the accuracy of our similarity measure, a dataset of articles is built as well as their similarity values manually estimated by human experts. Experiments performed on this dataset using Pearson correlation show that the similarity values obtained using the proposed measure are very close to those estimated by human experts.
Publisher
River Publishers
Subject
Computer Networks and Communications,Information Systems,Software