Developing a More Accurate Biomedical Literature Retrieval Method using Deep Learning and Citations in PubMed Central Full-text Articles-Reference-Cited by-同舟云学术

Developing a More Accurate Biomedical Literature Retrieval Method using Deep Learning and Citations in PubMed Central Full-text Articles

Published:2021-10-23 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Lo Chun-chao,Tian Shubo^ORCID,Tao Yuchuan,Hao Jie,Zhang Jinfeng

Abstract

AbstractMost queries submitted to a literature search engine can be more precisely written as sentences to give the search engine more specific information. Sentence queries should be more effective, in principle, than short queries with small numbers of keywords. Querying with full sentences is also a key step in question-answering and citation recommendation systems. Despite the considerable progress in natural language processing (NLP) in recent years, using sentence queries on current search engines does not yield satisfactory results. In this study, we developed a deep learning-based method for sentence queries, called DeepSenSe, using citation data available in full-text articles obtained from PubMed Central (PMC). A large amount of labeled data was generated from millions of matched citing sentences and cited articles, making it possible to train quality predictive models using modern deep learning techniques. A two-stage approach was designed: in the first stage we used a modified BM25 algorithm to obtain the top 1000 relevant articles; the second stage involved re-ranking the relevant articles using DeepSenSe. We tested our method using a large number of sentences extracted from real scientific articles in PMC. Our method performed substantially better than PubMed and Google Scholar for sentence queries.

Publisher

Cold Spring Harbor Laboratory

Reference46 articles.

1. How to Interpret PubMed Queries and Why It Matters;J Am Soc Inf Sci Technol,2009

2. Literature mining for the biologist: from information retrieval to biological discovery

3. An empirical study of tokenization strategies for biomedical information retrieval

4. Evaluating Relevance Ranking Strategies for MEDLINE Retrieval

5. Evaluation of query expansion using MeSH in PubMed