Machine Learning Methods to Better Predict Post-Hematopoietic Stem Cell Transplant (HSCT) Leukemic Relapse in Pediatric Patients with Acute Lymphoblastic Leukemia: Random Forest (RF) Classification Featuring Serial Post-Transplant Lineage-Specific Chimerism

Author:

Shyr David C1,Zhang Bing Melody2,Parkman Robertson3,Brewer Simon E.4

Affiliation:

1. Department of Pediatrics/Division of Stem Cell Transplantation and Regenerative Medicine, Stanford School of Medicine, Palo Alto, CA

2. Department of Pathology, Stanford University, Palo Alto, CA

3. Division of Stem Cell Transplantation and Regenerative Medicine, Department of Pediatrics, Stanford University Medical Center, Stanford, CA

4. University of Utah, Salt Lake City, UT

Abstract

The ability to accurately predict leukemic relapse post-HSCT would improve outcomes by allowing pre-emptive therapeutic strategies. Recent studies have identified post-transplant T- and CD34 cell chimerism as predictors of relapse in patients, who had undergone HSCT for hematologic malignancies (Preuner et al, 2016; Lee et al, 2015). However, these studies assess relapse risk looking at only a single threshold of chimerism using standard regression analysis, which permits only limited consideration of other patient variables. As the result, the findings of these analysis are frequently not applicable to patients generally. Machine learning methods offer the possibility to capture nonlinear relationships and simultaneous interactions between multiple variables, thus better recapitulate the dynamics and nuances of the relapse process in different patients. We use machine learning methods, specifically random forest classification (RF), to build a predictive model of post-transplant relapse and to analyze the data from a cohort of 46 pediatric patients, who received HSCT for acute lymphoblastic leukemia (ALL) and had serial lineage-specific chimerism testing post-transplant. Our model achieved 58 % sensitivity and 98% specificity at predicting relapses in cross validation compared to a baseline model (24% sensitivity, 76% specificity). Consistent with previous reports, our model implicates both peripheral blood (PB) donor CD34 and CD3 chimerism as important variables for relapse. More importantly, the RF showed how different variables interacted with each other, providing additional insights into how to best interpret post-transplant chimerism results. To our knowledge, this is the first study featuring RF machine learning methods in the clinical setting of relapse after HSCT. We use a dataset of patients with ALL undergoing HSCT at Lucile Packard Children's Hospital from 2012 to 2018. Variables collected are summarized in Table 1. The analytical sensitivity of STR-based chimerism testing is 1%. Chimerism results on the same day of relapse were excluded from the analysis. The RF model is based on a set of 500 individual decision trees, each based on a bootstrapped sample of the patient data. A 5-fold cross-validation was used to test predictive skill, with 20% of patients excluded from each fold. We compared results with a Monte Carlo baseline model in which relapse status was repeatedly assigned randomly to each patient with a probability based on the prevalence of relapse in our cohort. Patients, transplantation, and relapse characteristics are summarized in Table 2. Chimerism data are summarized in Table 3. The cross-validation results show a robust predictive skill of relapse within 2 years post-transplant. Our RF achieved 58% sensitivity and 98% specificity, greatly improving the predictive values from the base model (Table 4). Variable importance, the ability of a variable to decrease the error of the prediction model, was calculated for all variables used in our RF (Figure 1). Our analysis shows that the age at the time of transplant has the highest importance, followed by PB donor CD34 chimerism. Bone marrow chimerism generally has lower importance suggesting PB monitoring only is adequate in the clinical setting. We showcase the relationships of 1) age at transplant, 2) donor PB CD34, and 3) donor PB CD3 chimerism to the odds of relapse using a partial dependence plot. Younger patients relapse less often. Donor PB CD34 chimerism exhibits a threshold effect, in which the odds of relapse dramatically decreases when it is above 95% while donor PB CD3 chimerism has a more gradual linear profile (Figure 2). 2D dependence plot of donor PB CD34 and PB CD3 chimerism shows the interaction of the two variables (Figure 3) as continuous variables; relapse risk remaining low with even if donor PB CD3 chimerism is as low as 50% as long as donor PB CD34 chimerism is > 95%. Our study shows that machine learning methods such as RF can be very useful at making accurate predictive model of post-HSCT complications that incorporates multiple variables, allowing for more granular differentiation between different patients. Such analyses can enable more effective deployment of risk-adapted, personalized treatment. By building hundreds of independent decision trees, the RF is also able provide useful insights to the interaction between different variables in a clinically relevant manner. Disclosures No relevant conflicts of interest to declare.

Publisher

American Society of Hematology

Subject

Cell Biology,Hematology,Immunology,Biochemistry

Cited by 1 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Predicting the Survival Status of Patient after Bone Marrow Transplant Using Fuzzy Discernibility Matrix;2022 OPJU International Technology Conference on Emerging Technologies for Sustainable Development (OTCON);2023-02-08

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3