A Deep Learning Approach to Refine the Identification of High-Quality Clinical Research Articles From the Biomedical Literature: Protocol for Algorithm Development and Validation-Reference-Cited by-同舟云学术

A Deep Learning Approach to Refine the Identification of High-Quality Clinical Research Articles From the Biomedical Literature: Protocol for Algorithm Development and Validation

Published:2021-11-29 Issue:11 Volume:10 Page:e29398
ISSN:1929-0748
Container-title:JMIR Research Protocols
language:en
Short-container-title:JMIR Res Protoc

Author:

Abdelkader Wael^ORCID,Navarro Tamara^ORCID,Parrish Rick^ORCID,Cotoi Chris^ORCID,Germini Federico^ORCID,Linkins Lori-Ann^ORCID,Iorio Alfonso^ORCID,Haynes R Brian^ORCID,Ananiadou Sophia^ORCID,Chu Lingyang^ORCID,Lokker Cynthia^ORCID

Abstract

Background A barrier to practicing evidence-based medicine is the rapidly increasing body of biomedical literature. Use of method terms to limit the search can help reduce the burden of screening articles for clinical relevance; however, such terms are limited by their partial dependence on indexing terms and usually produce low precision, especially when high sensitivity is required. Machine learning has been applied to the identification of high-quality literature with the potential to achieve high precision without sacrificing sensitivity. The use of artificial intelligence has shown promise to improve the efficiency of identifying sound evidence. Objective The primary objective of this research is to derive and validate deep learning machine models using iterations of Bidirectional Encoder Representations from Transformers (BERT) to retrieve high-quality, high-relevance evidence for clinical consideration from the biomedical literature. Methods Using the HuggingFace Transformers library, we will experiment with variations of BERT models, including BERT, BioBERT, BlueBERT, and PubMedBERT, to determine which have the best performance in article identification based on quality criteria. Our experiments will utilize a large data set of over 150,000 PubMed citations from 2012 to 2020 that have been manually labeled based on their methodological rigor for clinical use. We will evaluate and report on the performance of the classifiers in categorizing articles based on their likelihood of meeting quality criteria. We will report fine-tuning hyperparameters for each model, as well as their performance metrics, including recall (sensitivity), specificity, precision, accuracy, F-score, the number of articles that need to be read before finding one that is positive (meets criteria), and classification probability scores. Results Initial model development is underway, with further development planned for early 2022. Performance testing is expected to star in February 2022. Results will be published in 2022. Conclusions The experiments will aim to improve the precision of retrieving high-quality articles by applying a machine learning classifier to PubMed searching. International Registered Report Identifier (IRRID) DERR1-10.2196/29398

Publisher

JMIR Publications Inc.

Subject

General Medicine

Reference55 articles.

1. MEDLINE PubMed Production StatisticsNational Library of Medicine20202021-07-31https://www.nlm.nih.gov/bsd/medline_pubmed_production_stats.html

2. A Deep Learning Method to Automatically Identify Reports of Scientifically Rigorous Clinical Research from the Biomedical Literature: Comparative Analytic Study

3. Where's the meat in clinical journals?

4. MEDLINE clinical queries are robust when searching in recent publishing years

Cited by 9 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. The McMaster Health Information Research Unit: Over a Quarter-Century of Health Informatics Supporting Evidence-Based Medicine;Journal of Medical Internet Research;2024-07-31

2. What You May Have Missed in 2023: Keeping Up With the Constant Flow of New Medical Evidence;Annals of Internal Medicine;2024-05

3. The McMaster Health Information Research Unit: Over a Quarter-Century of Health Informatics Supporting Evidence-Based Medicine (Preprint);2024-03-28

4. A hybrid machine learning and natural language processing model for early detection of acute coronary syndrome;Healthcare Analytics;2023-12

5. Protocol for a living evidence synthesis on variants of concern and COVID-19 vaccine effectiveness;Vaccine;2023-10