Predicting Long COVID in the National COVID Cohort Collaborative Using Super Learner (Preprint)-Reference-Cited by-同舟云学术

Predicting Long COVID in the National COVID Cohort Collaborative Using Super Learner (Preprint)

Published:2023-10-03 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Butzin-Dozier Zachary^ORCID,Ji Yunwen,Li Haodong,Coyle Jeremy,Shi Junming (Seraphina),Phillips Rachael V,Mertens Andrew^ORCID,Pirracchio Romain,van der Laan Mark J,Patel Rena C^ORCID,Colford John M,Hubbard Alan E^ORCID

Abstract

UNSTRUCTURED

Post-acute Sequelae of COVID-19 (PASC), also known as Long COVID, is a broad grouping of a range of long-term symptoms following acute COVID-19 infection. An understanding of characteristics that are predictive of future PASC is valuable, as this can inform the identification of high-risk individuals and future preventative efforts. However, current knowledge regarding PASC risk factors is limited. Using a sample of 55,257 participants from the National COVID Cohort Collaborative, as part of the NIH Long COVID Computational Challenge, we sought to predict individual risk of PASC diagnosis from a curated set of clinically informed covariates. We predicted individual PASC status, given covariate information, using Super Learner (an ensemble machine learning algorithm also known as stacking) to learn the optimal, AUC-maximizing combination of gradient boosting and random forest algorithms. We were able to predict individual PASC diagnoses accurately (AUC 0.947). Temporally, we found that baseline characteristics were most predictive of future PASC diagnosis, compared with characteristics immediately before, during, or after COVID-19 infection. This finding supports the hypothesis that clinicians may be able to accurately assess the risk of PASC in patients prior to acute COVID diagnosis, which could improve early interventions and preventive care. We found that medical utilization, demographics, anthropometry, and respiratory factors were most predictive of PASC diagnosis. This highlights the importance of respiratory characteristics in PASC risk assessment. The methods outlined here provide an open-source, applied example of using Super Learner to predict PASC status using electronic health record data, which can be replicated across a variety of settings.

INTERNATIONAL REGISTERED REPORT

RR2-https://doi.org/10.1101/2023.07.27.23293272

Publisher

JMIR Publications Inc.

Reference17 articles.

1. Trends in Disease Severity and Health Care Utilization During the Early Omicron Variant Period Compared with Previous SARS-CoV-2 High Transmission Periods — United States, December 2020–January 2022

2. High-dimensional characterization of post-acute sequelae of COVID-19

3. Identifying who has long COVID in the USA: a machine learning approach using N3C data

4. Attributes and predictors of long COVID

5. Risk Factors Associated With Post−COVID-19 Condition