The external validity of machine learning-based prediction scores from hematological parameters of COVID-19: A study using hospital records from Brazil, Italy, and Western Europe-Reference-Cited by-同舟云学术

The external validity of machine learning-based prediction scores from hematological parameters of COVID-19: A study using hospital records from Brazil, Italy, and Western Europe

Published:2023-03-08 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Safdari Ali Akbar,Keshav Chanda Sai,Mody Deepanshu,Verma Kshitij,Kaushal Utsav,Burra Vaadeendra Kumar,Ray Sibnath,Bandyopadhyay Debashree^ORCID

Abstract

AbstractBackgroundThe COVID-19 pandemic is the deadliest threat to humankind caused by the SARS-COV-2 virus in recent times. The gold standard for its detection, quantitative Real-Time Polymerase Chain Reaction (qRT-PCR), has several limitations regarding experimental handling, expense, and time. While the hematochemical values of routine blood tests have been reported as a faster and cheaper alternative, the external validity of the model on a diverse population has yet to be thoroughly investigated. Here we studied the external validity of machine learning-based prediction scores from hematological parameters recorded in Brazil, Italy, and Western Europe.Methods and FindingsThe publicly available hematological records (raw sample size (n) = 195554) from hospitals of three different territories, Brazil, Italy, and Western Europe, were preprocessed to develop the training, testing, and prediction cohorts for ML models. A total of eight (sub)datasets were trained on seven different ML classifiers. The XGBoost classifier performed consistently better on all the datasets producing eight different models. The working models include a set of either four or fourteen hematological parameters. The internal performances of the XGBoost models (AUC scores range from 84% to 97%) were superior to the ML models reported in the literature for a few datasets (AUC scores range from 84% to 87%). The external performance (AUC score) was 86% when the model was trained and tested on fourteen hematological parameters obtained from the same country (Brazil) but on independent datasets. However, the external performances were reduced when tested across the populations; 69% when trained on datasets from Italy (n=1736) and tested on datasets from Brazil (n=602)) and 65%, when trained on datasets from Italy and tested on datasets from Western Europe (n=1587)) respectively.ConclusionFor the first time, this report showed that the models trained and tested on the same population but on separate records produced reasonably accurate results. The study promises the confidence of these models trained and tested within the same populations and has the potential application to extend those to other demographic locations. Both four- and fourteen-parameter models are publicly available;https://covipred.bits-hyderabad.ac.in/homeAuthor SummaryCOVID-19 has posed the deadliest threat to the human population in the 21stcentury. Timely detection of the disease could save more lives. The RT-PCR test is considered the gold standard for COVID-19 detection. However, there are several limitations of the technique that suggests developing an alternate detection protocol that would be efficient, fast, and cheap. Among several other alternate detection techniques, hematology based Machine-Learning (ML) prediction is one. All the hematology-based predictions reported so far in the literature were only internally validated. Considering the need to develop an alternate protocol for rapid, near-accurate, and cheaper COVID-19 detection techniques, we aim to externally validate the hematology-based ML prediction. Here external validation indicates use of two independent datasets for model training and testing, in contrast to internal validation where the same dataset splits into train and test sets. We have integrated published clinical records from Brazil, Italy, and West Europe hospitals. Internal ML model performances are superior compared to those reported in literature. The external model performances were equivalent to the internal performances when trained and tested on the same population. However, the external performances were inferior when train and test sets were from different populations. The results promise the utility of these models on the same populations. However, it also warns to train the model on one population and test it on another. The outcome of this work has the potential for an initial screen of COVID-19 based on hematological parameters before qRT-PCR tests.

Publisher

Cold Spring Harbor Laboratory

Reference21 articles.

1. New Substitutions on NS1 Protein from Influenza A (H1N1) Virus: Bioinformatics Analyses of Indian Strains Isolated from 2009 to 2020;Heal. Sci. Reports,2022

2. Guidelines on newly identified limitations of diagnostic tools for COVID‐19 and consequences

3. Role of Hematological Parameters in the Stratification of COVID-19 Disease Severity;Ann. Med. Surg,2021

4. Quantifying viable virus-specific T cells without a priori knowledge of fine epitope specificity