Affiliation:
1. Department of Biomedical and Health Informatics Children's Hospital of Philadelphia Philadelphia Pennsylvania USA
2. Perelman School of Medicine University of Pennsylvania School of Medicine Philadelphia Pennsylvania USA
3. Division of Oncology The Children's Hospital of Philadelphia Philadelphia Pennsylvania USA
4. Department of Pediatrics Emory University School of Medicine Atlanta Georgia USA
5. Aflac Cancer & Blood Disorders Center Children's Healthcare of Atlanta Atlanta Georgia USA
6. Division of Infectious Diseases The Children's Hospital of Philadelphia Philadelphia Pennsylvania USA
Abstract
AbstractBackgroundAdministrative datasets are useful for identifying rare disease cohorts such as pediatric acute myeloid leukemia (AML). Previously, cohorts were assembled using labor‐intensive, manual reviews of patients’ longitudinal chemotherapy data.MethodsWe utilized a two‐step machine learning (ML) method to (i) identify pediatric patients with newly diagnosed AML, and (ii) among the identified AML patients, their chemotherapy courses, in an administrative/billing database. Using 2558 patients previously manually reviewed, multiple ML algorithms were derived from 75% of the study sample, and the selected model was tested in the remaining hold‐out sample. The selected model was also applied to assemble a new pediatric AML cohort and further assessed in an external validation, using a standalone cohort established by manual chart abstraction.ResultsFor patient identification, the selected Support Vector Machine model yielded a sensitivity of 0.97 and a positive predictive value (PPV) of 0.97 in the hold‐out test sample. For course‐specific chemotherapy regimen and start date identification, the selected Random Forest model yielded overall PPV greater than or equal to 0.88 and sensitivity greater than or equal to 0.86 across all courses in the test sample. When applied to new cohort assembly, ML identified 3016 AML patients with 10,588 treatment courses. In the external validation subset, PPV was greater than or equal to 0.75 and sensitivity was greater than or equal to 0.82 for patient identification, and PPV was greater than or equal to 0.93 and sensitivity was greater than or equal to 0.94 for regimen identifications.ConclusionA carefully designed ML model can accurately identify pediatric AML patients and their chemotherapy courses from administrative databases. This approach may be generalizable to other diseases and databases.
Subject
Oncology,Hematology,Pediatrics, Perinatology and Child Health
Cited by
4 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献