An Interpretable Machine Learning Framework for Rare Disease: A Case Study to Stratify Infection Risk in Pediatric Leukemia

Author:

Al-Hussaini Irfan12ORCID,White Brandon13ORCID,Varmeziar Armon13,Mehra Nidhi13,Sanchez Milagro13,Lee Judy4,DeGroote Nicholas P.4,Miller Tamara P.45,Mitchell Cassie S.136ORCID

Affiliation:

1. Laboratory for Pathology Dynamics, Georgia Institute of Technology and Emory University, Atlanta, GA 30332, USA

2. Department of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA 30332, USA

3. Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA 30332, USA

4. Aflac Cancer and Blood Disorders Center, Children’s Healthcare of Atlanta, Atlanta, GA 30322, USA

5. Department of Pediatrics, Division of Pediatric Hematology/Oncology, Emory University, Atlanta, GA 30332, USA

6. Machine Learning Center at Georgia Tech, Georgia Institute of Technology, Atlanta, GA 30332, USA

Abstract

Background: Datasets on rare diseases, like pediatric acute myeloid leukemia (AML) and acute lymphoblastic leukemia (ALL), have small sample sizes that hinder machine learning (ML). The objective was to develop an interpretable ML framework to elucidate actionable insights from small tabular rare disease datasets. Methods: The comprehensive framework employed optimized data imputation and sampling, supervised and unsupervised learning, and literature-based discovery (LBD). The framework was deployed to assess treatment-related infection in pediatric AML and ALL. Results: An interpretable decision tree classified the risk of infection as either “high risk” or “low risk” in pediatric ALL (n = 580) and AML (n = 132) with accuracy of ∼79%. Interpretable regression models predicted the discrete number of developed infections with a mean absolute error (MAE) of 2.26 for bacterial infections and an MAE of 1.29 for viral infections. Features that best explained the development of infection were the chemotherapy regimen, cancer cells in the central nervous system at initial diagnosis, chemotherapy course, leukemia type, Down syndrome, race, and National Cancer Institute risk classification. Finally, SemNet 2.0, an open-source LBD software that links relationships from 33+ million PubMed articles, identified additional features for the prediction of infection, like glucose, iron, neutropenia-reducing growth factors, and systemic lupus erythematosus (SLE). Conclusions: The developed ML framework enabled state-of-the-art, interpretable predictions using rare disease tabular datasets. ML model performance baselines were successfully produced to predict infection in pediatric AML and ALL.

Funder

Georgia Institute of Technology President’s Undergraduate Research Award

NIH

Aflac Cancer and Blood Disorders Center, Children’s Healthcare of Atlanta

National Science Foundation CAREER award

Chan Zuckerberg Initiative

Publisher

MDPI AG

Reference81 articles.

1. Pediatric AML: From biology to clinical management;Zwaan;J. Clin. Med.,2015

2. Systemic viral infection in children receiving chemotherapy for acute leukemia;Bochennek;Pediatr. Blood Cancer,2020

3. Infection-related complications during treatment for childhood acute lymphoblastic leukemia;Inaba;Ann. Oncol.,2017

4. Lymphocyte functions of child patients with ALL (acute lymphoblastic leukemia) in remission;Suzuki;Jpn. J. Clin. Oncol.,1984

5. Immunoglobulin class and subclass concentrations after treatment of childhood leukemia;Kristinsson;Pediatr. Hematol. Oncol.,2001

Cited by 2 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3