Lifelong Machine Learning and root cause analysis for large-scale cancer patient data-Reference-Cited by-同舟云学术

Lifelong Machine Learning and root cause analysis for large-scale cancer patient data

Published:2019-12 Issue:1 Volume:6 Page:
ISSN:2196-1115
Container-title:Journal of Big Data
language:en
Short-container-title:J Big Data

Author:

Pal Gautam^ORCID,Hong Xianbin,Wang Zhuo,Wu Hongyi,Li Gangmin,Atkinson Katie

Abstract

Abstract Introduction This paper presents a lifelong learning framework which constantly adapts with changing data patterns over time through incremental learning approach. In many big data systems, iterative re-training high dimensional data from scratch is computationally infeasible since constant data stream ingestion on top of a historical data pool increases the training time exponentially. Therefore, the need arises on how to retain past learning and fast update the model incrementally based on the new data. Also, the current machine learning approaches do the model prediction without providing a comprehensive root cause analysis. To resolve these limitations, our framework lays foundations on an ensemble process between stream data with historical batch data for an incremental lifelong learning (LML) model. Case description A cancer patient’s pathological tests like blood, DNA, urine or tissue analysis provide a unique signature based on the DNA combinations. Our analysis allows personalized and targeted medications and achieves a therapeutic response. Model is evaluated through data from The National Cancer Institute’s Genomic Data Commons unified data repository. The aim is to prescribe personalized medicine based on the thousands of genotype and phenotype parameters for each patient. Discussion and evaluation The model uses a dimension reduction method to reduce training time at an online sliding window setting. We identify the Gleason score as a determining factor for cancer possibility and substantiate our claim through Lilliefors and Kolmogorov–Smirnov test. We present clustering and Random Decision Forest results. The model’s prediction accuracy is compared with standard machine learning algorithms for numeric and categorical fields. Conclusion We propose an ensemble framework of stream and batch data for incremental lifelong learning. The framework successively applies first streaming clustering technique and then Random Decision Forest Regressor/Classifier to isolate anomalous patient data and provides reasoning through root cause analysis by feature correlations with an aim to improve the overall survival rate. While the stream clustering technique creates groups of patient profiles, RDF further drills down into each group for comparison and reasoning for useful actionable insights. The proposed MALA architecture retains the past learned knowledge and transfer to future learning and iteratively becomes more knowledgeable over time.

Funder

Accenture Technology Labs, Beijing, China

Publisher

Springer Science and Business Media LLC

Subject

Information Systems and Management,Computer Networks and Communications,Hardware and Architecture,Information Systems

Link

http://link.springer.com/content/pdf/10.1186/s40537-019-0261-9.pdf

Reference43 articles.

1. Thrun S. Explanation-based neural network learning: a lifelong learning approach. Boston: Kluwer Academic Publishers; 1996.

2. Silver DL. The parallel transfer of task knowledge using dynamic learning rates based on a measure of relatedness. Connect Sci. 1996;8(2):277–94. https://doi.org/10.1080/095400996116929.

3. Silver DL, Mercer RE. The task rehearsal method of life-long learning: overcoming impoverished data. In: Cohen R, Spencer B, editors. Advances in artificial intelligence. Berlin: Springer; 2002. p. 90–101.

4. Silver DL, Poirier R. Sequential consolidation of learned task knowledge. In: Tawfik AY, Goodwin SD, editors. Advances in artificial intelligence. Berlin: Springer; 2004. p. 217–32.

5. Silver DL, Mason G, Eljabu L. Consolidation using sweep task rehearsal: overcoming the stability-plasticity problem. In: Barbosa D, Milios E, editors. Advances in artificial intelligence. Cham: Springer; 2015. p. 307–22.

Cited by 7 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Continual Learning for Time Series Forecasting: A First Survey;ITISE 2024;2024-07-17

2. Dual-Track Lifelong Machine Learning-Based Fine-Grained Product Quality Analysis;Applied Sciences;2023-01-17

3. On Paradigm of Industrial Big Data Analytics: From Evolution to Revolution;IEEE Transactions on Industrial Informatics;2022-12

4. Real-time user clickstream behavior analysis based on apache storm streaming;Electronic Commerce Research;2021-12-22

5. Lifelong Machine Learning-Based Quality Analysis for Product Review;Proceedings of the 3rd International Conference on Advanced Information Science and System;2021-11-26