Efficient Self-Supervised Metric Information Retrieval: A Bibliography Based Method Applied to COVID Literature-Reference-Cited by-同舟云学术

Efficient Self-Supervised Metric Information Retrieval: A Bibliography Based Method Applied to COVID Literature

Published:2021-09-26 Issue:19 Volume:21 Page:6430
ISSN:1424-8220
Container-title:Sensors
language:en
Short-container-title:Sensors

Author:

Moro Gianluca^ORCID,Valgimigli Lorenzo^ORCID

Abstract

The literature on coronaviruses counts more than 300,000 publications. Finding relevant papers concerning arbitrary queries is essential to discovery helpful knowledge. Current best information retrieval (IR) use deep learning approaches and need supervised training sets with labeled data, namely to know a priori the queries and their corresponding relevant papers. Creating such labeled datasets is time-expensive and requires prominent experts’ efforts, resources insufficiently available under a pandemic time pressure. We present a new self-supervised solution, called SUBLIMER, that does not require labels to learn to search on corpora of scientific papers for most relevant against arbitrary queries. SUBLIMER is a novel efficient IR engine trained on the unsupervised COVID-19 Open Research Dataset (CORD19), using deep metric learning. The core point of our self-supervised approach is that it uses no labels, but exploits the bibliography citations from papers to create a latent space where their spatial proximity is a metric of semantic similarity; for this reason, it can also be applied to other domains of papers corpora. SUBLIMER, despite is self-supervised, outperforms the Precision@5 (P@5) and Bpref of the state-of-the-art competitors on CORD19, which, differently from our approach, require both labeled datasets and a number of trainable parameters that is an order of magnitude higher than our.

Publisher

MDPI AG

Subject

Electrical and Electronic Engineering,Biochemistry,Instrumentation,Atomic and Molecular Physics, and Optics,Analytical Chemistry

Link

https://www.mdpi.com/1424-8220/21/19/6430/pdf

Reference72 articles.

1. RoBERTa: A Robustly Optimized BERT Pretraining Approach;Liu;arXiv,2019

2. SciBERT: A Pretrained Language Model for Scientific Text

3. BioBERT: a pre-trained biomedical language representation model for biomedical text mining

Cited by 18 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. A Study on the Integration of Big Data With Information Retrieval Technology in the Construction of Translation Talent Pools;International Journal of e-Collaboration;2024-08-26

2. Document‐to‐Document Retrieval Using Self‐Retrieval Learning and Automatic Keyword Extraction;IEEJ Transactions on Electrical and Electronic Engineering;2024-08-20

3. Evidence, my Dear Watson: Abstractive dialogue summarization on learnable relevant utterances;Neurocomputing;2024-03

4. Preliminary guideline for reporting bibliometric reviews of the biomedical literature (BIBLIO): a minimum requirements;Systematic Reviews;2023-12-15

5. Multi-language transfer learning for low-resource legal case summarization;Artificial Intelligence and Law;2023-09-25