Reengineering a machine learning phenotype to adapt to the changing COVID-19 landscape: A study from the N3C and RECOVER consortia-Reference-Cited by-同舟云学术

Reengineering a machine learning phenotype to adapt to the changing COVID-19 landscape: A study from the N3C and RECOVER consortia

Published:2023-12-09 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Crosskey Miles,McIntee Tomas,Preiss Sandy,Brannock Daniel,Yoo Yun Jae,Hadley Emily,Blancero Frank,Chew Rob,Loomba Johanna,Bhatia Abhishek,Chute Christopher G.,Haendel Melissa^ORCID,Moffitt Richard,Pfaff Emily

Abstract

ABSTRACTBackgroundIn 2021, we used the National COVID Cohort Collaborative (N3C) as part of the NIH RECOVER Initiative to develop a machine learning (ML) pipeline to identify patients with a high probability of having post-acute sequelae of SARS-CoV-2 infection (PASC), or Long COVID. However, the increased home testing, missing documentation, and reinfections that characterize the latter years of the pandemic necessitate reengineering our original model to account for these changes in the COVID-19 research landscape.MethodsOur updated XGBoost model gathers data for each patient in overlapping 100-day periods that progress through time, and issues a probability of Long COVID for each 100-day period. If a patient has known acute COVID-19 during any 100-day window (including reinfections), we censor the data from 7 days prior to the diagnosis/positive test date through 28 days after. These fixed time windows replace the prior model’s reliance on a documented COVID-19 index date to anchor its data collection, and are able to account for reinfections.ResultsThe updated model achieves an area under the receiver operating characteristic curve of 0.90. Precision and recall can be adjusted according to a given use case, depending on whether greater sensitivity or specificity is warranted.DiscussionBy eschewing the COVID-19 index date as an anchor point for analysis, we are now able to assess the probability of Long COVID among patients who may have tested at home, or with suspected (but untested) cases of COVID-19, or multiple SARS-CoV-2 reinfections. We view this exercise as a model for maintaining and updating any ML pipeline used for clinical research and operations.

Publisher

Cold Spring Harbor Laboratory

Reference21 articles.

1. Acute and postacute sequelae associated with SARS-CoV-2 reinfection

2. Quantifying the local adaptive landscape of a nascent bacterial community

3. Reese, J. T. et al. Generalisable long COVID subtypes: findings from the NIH N3C and RECOVER programmes. eBioMedicine 87, (2023).

4. Pfaff, E. R. et al. Coding long COVID: characterizing a new disease through an ICD-10 lens. BMC Med. 21, (2023).

5. Bowe, B. , Xie, Y. & Al-Aly, Z. Postacute sequelae of COVID-19 at 2 years. Nat. Med. 1–11 (2023).

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Effect of Paxlovid Treatment During Acute COVID-19 on Long COVID Onset: An EHR-Based Target Trial Emulation from the N3C and RECOVER Consortia;2024-01-22