Leveraging machine learning to identify acute myeloid leukemia patients and their chemotherapy regimens in an administrative database

Author:

Cao Lusha1ORCID,Huang Yuan‐Shung1ORCID,Wu Chao1,Getz Kelly23,Miller Tamara P.45ORCID,Ruiz Jenny23ORCID,Fisher Brian T.26,Seif Alix E.23,Aplenc Richard23,Li Yimei23

Affiliation:

1. Department of Biomedical and Health Informatics Children's Hospital of Philadelphia Philadelphia Pennsylvania USA

2. Perelman School of Medicine University of Pennsylvania School of Medicine Philadelphia Pennsylvania USA

3. Division of Oncology The Children's Hospital of Philadelphia Philadelphia Pennsylvania USA

4. Department of Pediatrics Emory University School of Medicine Atlanta Georgia USA

5. Aflac Cancer & Blood Disorders Center Children's Healthcare of Atlanta Atlanta Georgia USA

6. Division of Infectious Diseases The Children's Hospital of Philadelphia Philadelphia Pennsylvania USA

Abstract

AbstractBackgroundAdministrative datasets are useful for identifying rare disease cohorts such as pediatric acute myeloid leukemia (AML). Previously, cohorts were assembled using labor‐intensive, manual reviews of patients’ longitudinal chemotherapy data.MethodsWe utilized a two‐step machine learning (ML) method to (i) identify pediatric patients with newly diagnosed AML, and (ii) among the identified AML patients, their chemotherapy courses, in an administrative/billing database. Using 2558 patients previously manually reviewed, multiple ML algorithms were derived from 75% of the study sample, and the selected model was tested in the remaining hold‐out sample. The selected model was also applied to assemble a new pediatric AML cohort and further assessed in an external validation, using a standalone cohort established by manual chart abstraction.ResultsFor patient identification, the selected Support Vector Machine model yielded a sensitivity of 0.97 and a positive predictive value (PPV) of 0.97 in the hold‐out test sample. For course‐specific chemotherapy regimen and start date identification, the selected Random Forest model yielded overall PPV greater than or equal to 0.88 and sensitivity greater than or equal to 0.86 across all courses in the test sample. When applied to new cohort assembly, ML identified 3016 AML patients with 10,588 treatment courses. In the external validation subset, PPV was greater than or equal to 0.75 and sensitivity was greater than or equal to 0.82 for patient identification, and PPV was greater than or equal to 0.93 and sensitivity was greater than or equal to 0.94 for regimen identifications.ConclusionA carefully designed ML model can accurately identify pediatric AML patients and their chemotherapy courses from administrative databases. This approach may be generalizable to other diseases and databases.

Publisher

Wiley

Subject

Oncology,Hematology,Pediatrics, Perinatology and Child Health

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3