Abstract
Background
Pediatric asthma is a heterogeneous disease; however, current characterizations of its subtypes are limited. Machine learning (ML) methods are well-suited for identifying subtypes. In particular, deep neural networks can learn patient representations by leveraging longitudinal information captured in electronic health records (EHRs) while considering future outcomes. However, the traditional approach for subtype analysis requires large amounts of EHR data, which may contain protected health information causing potential concerns regarding patient privacy. Federated learning is the key technology to address privacy concerns while preserving the accuracy and performance of ML algorithms. Federated learning could enable multisite development and implementation of ML algorithms to facilitate the translation of artificial intelligence into clinical practice.
Objective
The aim of this study is to develop a research protocol for implementation of federated ML across a large clinical research network to identify and discover pediatric asthma subtypes and their progression over time.
Methods
This mixed methods study uses data and clinicians from the OneFlorida+ clinical research network, which is a large regional network covering linked and longitudinal patient-level real-world data (RWD) of over 20 million patients from Florida, Georgia, and Alabama in the United States. To characterize the subtypes, we will use OneFlorida+ data from 2011 to 2023 and develop a research-grade pediatric asthma computable phenotype and clinical natural language processing pipeline to identify pediatric patients with asthma aged 2-18 years. We will then apply federated learning to characterize pediatric asthma subtypes and their temporal progression. Using the Promoting Action on Research Implementation in Health Services framework, we will conduct focus groups with practicing pediatric asthma clinicians within the OneFlorida+ network to investigate the clinical utility of the subtypes. With a user-centered design, we will create prototypes to visualize the subtypes in the EHR to best assist with the clinical management of children with asthma.
Results
OneFlorida+ data from 2011 to 2023 have been collected for 411,628 patients aged 2-18 years along with 11,156,148 clinical notes. We expect to complete the computable phenotyping within the first year of the project, followed by subtyping during the second and third years, and then will perform the focus groups and establish the user-centered design in the fourth and fifth years of the project.
Conclusions
Pediatric asthma subtypes incorporating RWD from diverse populations could improve patient outcomes by moving the field closer to precision pediatric asthma care. Our privacy-preserving federated learning methodology and qualitative implementation work will address several challenges of applying ML to large, multicenter RWD data.
International Registered Report Identifier (IRRID)
DERR1-10.2196/57981
Reference43 articles.
1. The Burden of Childhood Asthma by Age Group, 1990–2019: A Systematic Analysis of Global Burden of Disease 2019 Data
2. 2021 National Health Interview Survey (NHIS) data. Most recent national asthma dataCenters for Disease Control and Prevention2024-05-15https://www.cdc.gov/asthma/most_recent_national_asthma_data.htm
3. Healthcare use data 2020Centers for Disease Control and Prevention2024-05-15https://www.cdc.gov/asthma/healthcare-use/2020/table_a.html
4. HCUPnet, Healthcare Cost and Utilization Project National (Nationwide) Inpatient Sample (NIS)2024-05-15https://hcup-us.ahrq.gov/nisoverview.jsp?
5. The national cost of asthma among school-aged children in the United States