Prediction of Allogeneic HSCT Related Mortality in Acute Leukemia: Exploring Boundaries of Prediction through Machine Learning Based Modeling. a Data Mining Study from the Acute Leukemia Working Party (ALWP) of the EBMT

Author:

Shouval Roni123,Labopin Myriam4,Unger Ron2,Giebel Sebastian5,Ciceri Fabio6,Schmid Christoph7,Esteve Jordi8,Baron Frédéric9,Savani Bipin N.10,Mohty Mohamad11,Nagler Arnon124

Affiliation:

1. Division of Hematology and Bone Marrow Transplantation, Chaim Sheba Medical Center, Tel-Hashomer, Ramat- Gan, Israel

2. Department of Computational Biology and Bioinfromatics, The Mina and Everard Goodman Faculty of Life Sciences, Bar-Ilan University, Ramat-Gan, Israel

3. The Chaim Sheba Medical Center, Tel-Hashomer, Tel-Hashomer, Israel

4. EBMT Acute Leukemia Working Party and Registry, Paris, France

5. Maria Sklodowska-Curie Memorial Cancer Center and Institute of Oncology, Gliwice Branch, Gliwice, Poland

6. San Raffaele Scientific Institute, Milano, Italy

7. Department of Hematology and Oncology, Klinikum Augsburg, Ludwig-Maximilians-University, Munich, Germany

8. Hospital Clínic, IDIBAPS, Barcelona, Spain

9. University of Liège, Liege, Belgium

10. Vanderbilt University Medical Center, Nashville, TN

11. Service d'Hématologie et Thérapie Cellulaire, AP-HP, UPMC Université Paris 6, UMR-S 938, CEREST-TC EBMT, Hôpital Saint Antoine, Paris, France

12. EBMT Acute Leukemia Working Party and Registry, Hematology Division, BMT and Cord Blood Bank, Tel-Aviv University, Chaim Sheba Medical Center, Tel-Hashomer, Ramat- Gan, Israel

Abstract

Abstract Background: Allogeneic hematopoietic stem cell transplantation (allo-HSCT) has been shown to increase survival and induce cure of acute leukemia (AL). Unfortunately, transplant related mortality (TRM) remains high. Risk scores, based on a conventional statistical approach, have been developed for TRM prediction. These have been well validated. Nevertheless, predictive performance is sub-optimal; thus, limiting clinical utility. Factors impeding prediction might be attributed to the statistical methodology, number and quality of features collected, or simply the size of the population analyzed. We set to explore these factors, using a novel computational approach, based on machine learning algorithms (ML). ML is a subfield of computer science and artificial intelligence that deals with the construction and study of systems that can learn from data, rather than follow only explicitly programmed instructions. Commonly applied in complex data scenarios, such as financial and technological settings, it may be suitable for outcome prediction if the field of HSCT. Study design: Using a cohort of 28,236 adult allo-HSCT recipients from the ALWP registry of the EBMT, transplanted between 2000-2011, owing to Acute Myeloid Leukemia or Acute Lymphoblastic Leukemia, and containing 24 variables (i.e., patient, leukemia, donor, and transplant characteristics) we devised a two phase data mining study 1) Development of ML based prediction models for day 100 TRM; 2) In- silico analysis (i.e., performed through a computerized simulation) of the developed models. Factors necessary for optimal prediction were explored: type of model, size of data set, number of necessary variables, and performance in specific subpopulations; Model development and analysis were performed with "WEKA" a data mining suite. The area under the receiver operating characteristic curve (AUC) is a commonly used evaluation method for binary choice problems, which involve classifying an instance as either positive or negative. A perfect model will score an AUC of 1, while random guessing will score an AUC of around of 0.5. The AUC was used as measure of predictive performance for the developed models. Results: We developed six machine learning based prediction models for TRM at day 100. Optimal AUCs ranged from 0.65-0.68. Predictive performance plateaued for a population size ranging from n=5647-8471, depending on the algorithm (Figure 1). A feature selection algorithm ranked variables according to importance. Provided with the ranked variable data, we discovered that a range of 6-12 ranked variables were necessary for optimal prediction, depending on the algorithm (Figure 2). Predictive performance of models developed for specific subpopulations, ranged from an average of 0.59 to 0.67 for patient in second complete remission and patients receiving reduced intensity conditioning respectively. Conclusions: We present a novel computational approach for prediction model development and analysis in the field of HSCT. Using data commonly collected on transplant patients, our simulation elucidate outcome prediction limiting factors. Regardless of the methodology applied, predictive performance converged when sampling more than 5000 patients. Few variables (approximately 6-12), "carry the weight" with regard to predictive influence. In summary, the presented findings describe a phenomenon of predictive saturation, with data traditionally collected. Improving the current performance will likely require additional types of input like genetic, biologic and procedural factors. Figure 1 Figure 1. Figure 2 Figure 2. Disclosures No relevant conflicts of interest to declare.

Publisher

American Society of Hematology

Subject

Cell Biology,Hematology,Immunology,Biochemistry

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3