Abstract
The comparison of documents—such as articles or patents search, bibliography recommendations systems, visualization of document collections, etc.—has a wide range of applications in several fields. One of the key tasks that such problems have in common is the evaluation of a similarity metric. Many such metrics have been proposed in the literature. Lately, deep learning techniques have gained a lot of popularity. However, it is difficult to analyze how those metrics perform against each other. In this paper, we present a systematic empirical evaluation of several of the most popular similarity metrics when applied to research articles. We analyze the results of those metrics in two ways, with a synthetic test that uses scientific papers and Ph.D. theses, and in a real-world scenario where we evaluate their ability to cluster papers from different areas of research.
Funder
Ministerio de Economía y Competitividad
Subject
Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science
Reference38 articles.
1. The Digitization of the World from Edge to Core;Rydning,2018
2. Clustering by Compression
3. Neural Document Embeddings for Intensive Care Patient Mortality Prediction;Grnarova;arXiv,2016
Cited by
4 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献