A pipeline for medical literature search and its evaluation

Author:

Zafar Imamah1,Wali Aamir1ORCID,Kunwar Muhammad Ahmed1,Afzal Noor1,Raza Muhammad1

Affiliation:

1. Fast School of Computing, National University of Computer and Emerging Sciences, Pakistan

Abstract

One database commonly used by clinicians for searching the medical literature and practicing evidence-based medicine is PubMed. As the literature grows, it has become challenging for users to find the relevant material quickly because most of the time the relevant results are not on the top. In this article, we propose a search and ranking pipeline to improve the search results based on relevancy. We first propose an ensemble model consisting of three classifiers: bidirectional long-short-term memory conditional random field (bi-LSTM-CRF), support vector machine and naive Bayes to recognise PICO (patient, intervention, comparison, outcome) elements from abstracts. The ensemble was trained on an annotated corpus consisting of 5000 abstracts split into 4000 and 1000 training and testing data, respectively. The ensemble recorded an accuracy of 93%. We then retrieved around 927,000 articles from PubMed for the years 2017–2021 (access date 16 April 2021). For every abstract, we extracted and grouped all P, I and O terms, and stored these groups along with the article ID in a separate database. During the search, every P, I and O term of the query is searched only in its corresponding group of every abstract. The scoring method simply counts the number of matches between the query’s P, I and O elements and the words in P, I and O groups, respectively. The abstracts are sorted by the number of matches and the top five abstracts are listed using their pre-stored abstract IDs. A comprehensive user study was conducted where 60 different queries were formulated and used to generate ranked search results using both PubMed and our proposed model. Five medical professionals assessed the ranked search results and marked every item as relevant or non-relevant. Both models were compared using precision@K and mean-average-precision@K metrics where K is 5. For most of the queries, our model produced higher precision@K values than PubMed. The mean-average-precision@K value of our model is also higher than PubMed (0.83 versus 0.67).

Publisher

SAGE Publications

Subject

Library and Information Sciences,Information Systems

Reference35 articles.

1. The BMJ. Poll results, 2021, https://www.bmj.com/content/suppl/2007/01/18/334.suppl_1.DC3

2. PubMed.org. PubMed, 2021, https://pubmed.ncbi.nlm.nih.gov/ (accessed 16 March 2021).

3. MeSH Now: automatic MeSH indexing at PubMed scale via learning to rank

4. Kari KH. Correlation between lack of information documentation and underdevelopment in Nigeria: empirical evidence from experts. Ianna J Interdiscip Stud 2022; 4(1): 35–43, https://www.iannajournalofinterdisciplinarystudies.com/index.php/1/article/view/84

5. Do Online Information Retrieval Systems Help Experienced Clinicians Answer Clinical Questions?

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3