Ensemble of deep learning language models to support the creation of living systematic reviews for the COVID-19 literature-Reference-Cited by-同舟云学术

Ensemble of deep learning language models to support the creation of living systematic reviews for the COVID-19 literature

Published:2023-01-19 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Knafou Julien^ORCID,Haas Quentin,Borissov Nikolay,Counotte Michel^ORCID,Low Nicola^ORCID,Imeri Hira,Ipekci Aziz Mert,Buitrago-Garcia Diana,Heron Leonie,Amini Poorya,Teodoro Douglas^ORCID

Abstract

AbstractBackgroundThe COVID-19 pandemic has led to an unprecedented amount of scientific publications, growing at a pace never seen before. Multiple living systematic reviews have been developed to assist professionals with up-to-date and trustworthy health information, but it is increasingly challenging for systematic reviewers to keep up with the evidence in electronic databases. We aimed to investigate deep learning-based machine learning algorithms to classify COVID-19 related publications to help scale-up the epidemiological curation process.MethodsIn this retrospective study, five different pre-trained deep learning-based language models were fine-tuned on a dataset of 6,365 publications manually classified into two classes, three subclasses and 22 sub-subclasses relevant for epidemiological triage purposes. In ak-fold cross-validation setting, each standalone model was assessed on a classification task and compared against an ensemble, which takes the standalone model predictions as input and uses different strategies to infer the optimal article class. A ranking task was also considered, in which the model outputs a ranked list of sub-subclasses associated with the article.ResultsThe ensemble model significantly outperformed the standalone classifiers, achieving a F1-score of 89.2 at the class level of the classification task. The difference between the standalone and ensemble models increases at the sub-subclass level, where the ensemble reaches a micro F1-score of 70% against 67% for the best performing standalone model. For the ranking task, the ensemble obtained the highest recall@3, with a performance of 89%. Using an unanimity voting rule, the ensemble can provide predictions with higher confidence on a subset of the data, achieving detection of original papers with a F1-score up to 97% on a subset of 80% of the collection instead of 93% on the whole dataset.ConclusionThis study shows the potential of using deep learning language models to perform triage of COVID-19 references efficiently and support epidemiological curation and review. The ensemble consistently and significantly outperforms any standalone model. Fine-tuning the voting strategy thresholds is an interesting alternative to annotate a subset with higher predictive confidence.

Publisher

Cold Spring Harbor Laboratory

Reference47 articles.

1. LitCovid: an open database of COVID-19 literature

2. Ipekci AM , Buitrago-Garcia D , Meili KW , Krauer F , Prajapati N , Thapa S , et al. Outbreaks of publications about emerging infectious diseases: the case of SARS-CoV-2 and Zika virus. BMC Med Res Methodol. 2021;50–50.

3. Lu Wang L , Lo K , Chandrasekhar Y , Reas R , Yang J , Eide D , et al. CORD-19: The Covid-19 Open Research Dataset. 2020 [cited 2022 Jun 29]; Available from: https://search.bvsalud.org/global-literature-on-novel-coronavirus-2019-ncov/resource/en/ppcovidwho-2130

4. Counotte M , Imeri H , Leonie H , Ipekci M , Low N. Living Evidence on COVID-19 [Internet]. 2020 [cited 2022 Jun 29]. Available from: https://ispmbern.github.io/covid-19/living-review/

5. The COVID-NMA initiative [Internet]. [cited 2022 Jun 29]. Available from: https://covid-nma.com/

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. DS4DH at MEDIQA-Chat 2023: Leveraging SVM and GPT-3 Prompt Engineering for Medical Dialogue Classification and Summarization;2023-06-12