An Empirical Evaluation of Document Embeddings and Similarity Metrics for Scientific Articles-Reference-Cited by-同舟云学术

An Empirical Evaluation of Document Embeddings and Similarity Metrics for Scientific Articles

Published:2022-06-02 Issue:11 Volume:12 Page:5664
ISSN:2076-3417
Container-title:Applied Sciences
language:en
Short-container-title:Applied Sciences

Author:

Gómez Joaquin^ORCID,Vázquez Pere-Pau^ORCID

Abstract

The comparison of documents—such as articles or patents search, bibliography recommendations systems, visualization of document collections, etc.—has a wide range of applications in several fields. One of the key tasks that such problems have in common is the evaluation of a similarity metric. Many such metrics have been proposed in the literature. Lately, deep learning techniques have gained a lot of popularity. However, it is difficult to analyze how those metrics perform against each other. In this paper, we present a systematic empirical evaluation of several of the most popular similarity metrics when applied to research articles. We analyze the results of those metrics in two ways, with a synthetic test that uses scientific papers and Ph.D. theses, and in a real-world scenario where we evaluate their ability to cluster papers from different areas of research.

Funder

Ministerio de Economía y Competitividad

Publisher

MDPI AG

Subject

Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science

Link

https://www.mdpi.com/2076-3417/12/11/5664/pdf

Reference38 articles.

1. The Digitization of the World from Edge to Core;Rydning,2018

2. Clustering by Compression

3. Neural Document Embeddings for Intensive Care Patient Mortality Prediction;Grnarova;arXiv,2016

Cited by 4 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Improving Dimensionality Reduction Projections for Data Visualization;Applied Sciences;2023-09-04

2. Detecting Cross-Lingual Information Gaps in Wikipedia;Companion Proceedings of the ACM Web Conference 2023;2023-04-30

3. Predicting Personalized Textual Reviews via Collaborative Filtering using Document Embedding;28th International Conference on Intelligent User Interfaces;2023-03-27

4. Scenario Construction Model of Railway Traffic Accidents Based on Similarity Theory;Lecture Notes in Operations Research;2023