The external validity of machine learning-based prediction scores from hematological parameters of COVID-19: A study using hospital records from Brazil, Italy, and Western Europe

Author:

Safdari Ali Akbar,Keshav Chanda Sai,Mody Deepanshu,Verma Kshitij,Kaushal Utsav,Burra Vaadeendra Kumar,Ray Sibnath,Bandyopadhyay DebashreeORCID

Abstract

AbstractBackgroundThe COVID-19 pandemic is the deadliest threat to humankind caused by the SARS-COV-2 virus in recent times. The gold standard for its detection, quantitative Real-Time Polymerase Chain Reaction (qRT-PCR), has several limitations regarding experimental handling, expense, and time. While the hematochemical values of routine blood tests have been reported as a faster and cheaper alternative, the external validity of the model on a diverse population has yet to be thoroughly investigated. Here we studied the external validity of machine learning-based prediction scores from hematological parameters recorded in Brazil, Italy, and Western Europe.Methods and FindingsThe publicly available hematological records (raw sample size (n) = 195554) from hospitals of three different territories, Brazil, Italy, and Western Europe, were preprocessed to develop the training, testing, and prediction cohorts for ML models. A total of eight (sub)datasets were trained on seven different ML classifiers. The XGBoost classifier performed consistently better on all the datasets producing eight different models. The working models include a set of either four or fourteen hematological parameters. The internal performances of the XGBoost models (AUC scores range from 84% to 97%) were superior to the ML models reported in the literature for a few datasets (AUC scores range from 84% to 87%). The external performance (AUC score) was 86% when the model was trained and tested on fourteen hematological parameters obtained from the same country (Brazil) but on independent datasets. However, the external performances were reduced when tested across the populations; 69% when trained on datasets from Italy (n=1736) and tested on datasets from Brazil (n=602)) and 65%, when trained on datasets from Italy and tested on datasets from Western Europe (n=1587)) respectively.ConclusionFor the first time, this report showed that the models trained and tested on the same population but on separate records produced reasonably accurate results. The study promises the confidence of these models trained and tested within the same populations and has the potential application to extend those to other demographic locations. Both four- and fourteen-parameter models are publicly available;https://covipred.bits-hyderabad.ac.in/homeAuthor SummaryCOVID-19 has posed the deadliest threat to the human population in the 21stcentury. Timely detection of the disease could save more lives. The RT-PCR test is considered the gold standard for COVID-19 detection. However, there are several limitations of the technique that suggests developing an alternate detection protocol that would be efficient, fast, and cheap. Among several other alternate detection techniques, hematology based Machine-Learning (ML) prediction is one. All the hematology-based predictions reported so far in the literature were only internally validated. Considering the need to develop an alternate protocol for rapid, near-accurate, and cheaper COVID-19 detection techniques, we aim to externally validate the hematology-based ML prediction. Here external validation indicates use of two independent datasets for model training and testing, in contrast to internal validation where the same dataset splits into train and test sets. We have integrated published clinical records from Brazil, Italy, and West Europe hospitals. Internal ML model performances are superior compared to those reported in literature. The external model performances were equivalent to the internal performances when trained and tested on the same population. However, the external performances were inferior when train and test sets were from different populations. The results promise the utility of these models on the same populations. However, it also warns to train the model on one population and test it on another. The outcome of this work has the potential for an initial screen of COVID-19 based on hematological parameters before qRT-PCR tests.

Publisher

Cold Spring Harbor Laboratory

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3