Author:
Wang Hengkang,Lu Han,Sun Ju,Safo Sandra E.
Abstract
Abstract
Background
Technological advances have enabled the generation of unique and complementary types of data or views (e.g. genomics, proteomics, metabolomics) and opened up a new era in multiview learning research with the potential to lead to new biomedical discoveries.
Results
We propose iDeepViewLearn (Interpretable Deep Learning Method for Multiview Learning) to learn nonlinear relationships in data from multiple views while achieving feature selection. iDeepViewLearn combines deep learning flexibility with the statistical benefits of data and knowledge-driven feature selection, giving interpretable results. Deep neural networks are used to learn view-independent low-dimensional embedding through an optimization problem that minimizes the difference between observed and reconstructed data, while imposing a regularization penalty on the reconstructed data. The normalized Laplacian of a graph is used to model bilateral relationships between variables in each view, therefore, encouraging selection of related variables. iDeepViewLearn is tested on simulated and three real-world data for classification, clustering, and reconstruction tasks. For the classification tasks, iDeepViewLearn had competitive classification results with state-of-the-art methods in various settings. For the clustering task, we detected molecular clusters that differed in their 10-year survival rates for breast cancer. For the reconstruction task, we were able to reconstruct handwritten images using a few pixels while achieving competitive classification accuracy. The results of our real data application and simulations with small to moderate sample sizes suggest that iDeepViewLearn may be a useful method for small-sample-size problems compared to other deep learning methods for multiview learning.
Conclusion
iDeepViewLearn is an innovative deep learning model capable of capturing nonlinear relationships between data from multiple views while achieving feature selection. It is fully open source and is freely available at https://github.com/lasandrall/iDeepViewLearn.
Funder
National Institutes of Health
Publisher
Springer Science and Business Media LLC
Reference37 articles.
1. Hotelling H. Relations between two sets of variables. Biometrika. 1936;28:312–77.
2. Safo SE, Ahn J, Jeon Y, Jung S. Sparse generalized eigenvalue problem with application to canonical correlation analysis for integrative analysis of methylation and gene expression data. Biometrics. 2018;74(4):1362–71.
3. Akaho, S. A kernel method for canonical correlation analysis. Int’l Meeting on Psychometric Society. 2001.
4. Lopez-Paz D, Sra S, Smola A, Ghahramani Z, Schölkopf B. Randomized nonlinear component analysis. In: International Conference on Machine Learning, 2014;pp. 1359–1367 . PMLR
5. Andrew G, Arora R, Bilmes J, Livescu K. Deep canonical correlation analysis. In: Journal of Machine Learning Research: Workshop and Conference Proceedings. 2013.
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献