Abstract
AbstractThis paper proposes a hybrid model to improve Information Content (IC) related metrics of semantic similarity between words, named IC+SP, based on the essential hypothesis that IC and the shortest path are two relatively independent semantic evidences and have approximately equal influences to the semantic similarity metric. The paradigm of IC+SP is to linearly combine the IC-related metric and the shortest path. Meanwhile, a transformation from the semantic similarity of the concepts to that of the words is presented by maximizing every component of IC+SP. 13 improved IC-related metrics based on IC+SP are formed and implemented on the experimental platform HESML Lastra-Díaz (Inf Syst 66:97–118, 2017). Pearson’s and Spearman’s correlation coefficients on well-accepted benchmarks for the improved metrics compare to those for the original ones to evaluate IC+SP. I introduce the Wilcoxon Signed-Rank Test needing no standard distribution hypothesis, while, this hypothesis is required by T-Test on the sample of small size. T-Test, as well as the Wilcoxon Signed-Rank Test, conduct on the differences of the correlative coefficients for improved and original metrics. It is expected that the improved IC-related metrics could significantly outperform their corresponding original ones, and the experimental results, including the comparisons of mean and maximum of correlation coefficients as well as the p-value and confidence interval of both tests, accomplish the anticipation in the vast majority of cases.
Funder
National Natural Science Foundation of China
Hubei Provincial Natural Science Foundation
Key Laboratory of Dynamic Cognitive System of Electromagnetic Spectrum Space
Publisher
Springer Science and Business Media LLC
Reference63 articles.
1. Lastra-Díaz JJ, García-Serrano A, Batet M, Fernández M, Chirigati F (2017) Hesml: a scalable ontology-based semantic similarity measures library with a set of reproducible experiments and a replication dataset. Inf Syst 66:97–118
2. Harispe S, Ranwez S, Janaqi S, Montmain J (2015) Semantic similarity from natural language and ontology analysis. Synth Lect Hum Lang Technol 8(1):1–254
3. Hovy E, Navigli R, Ponzetto SP (2013) Collaboratively built semi-structured content and artificial intelligence: the story so far. Artif Intell 194:2–27
4. Wei T, Lu Y, Chang H, Zhou Q, Bao X (2015) A semantic approach for text clustering using wordnet and lexical chains. Expert Syst Appl 42(4):2264–2275
5. Moro A, Raganato A, Navigli R (2014) Entity linking meets word sense disambiguation: a unified approach. Trans Assoc Comput Linguist 2:231–244