Geometric structure guided model and algorithms for complete deconvolution of gene expression data
-
Published:2022
Issue:3
Volume:4
Page:441
-
ISSN:2639-8001
-
Container-title:Foundations of Data Science
-
language:
-
Short-container-title:FoDS
Author:
Chen Duan1, Li Shaoyu2, Wang Xue3
Affiliation:
1. Department of Mathematics and Statistics, School of Data Science, University of North Carolina at Charlotte, USA 2. Department of Mathematics and Statistics, University of North Carolina at Charlotte, USA 3. Department of Quantitative Health Sciences, Mayo Clinic, Florida, 32224, USA
Abstract
<p style='text-indent:20px;'>Complete deconvolution analysis for bulk RNA-seq data is important and helpful to distinguish whether the differences of disease-associated GEPs (gene expression profiles) in tissues of patients and normal controls are due to changes in cellular composition of tissue samples, or due to GEPs changes in specific cells. One of the major techniques to perform complete deconvolution is nonnegative matrix factorization (NMF), which also has a wide-range of applications in the machine learning community. However, the NMF is a well-known strongly ill-posed problem, so a direct application of NMF to RNA-seq data will suffer severe difficulties in the interpretability of solutions. In this paper, we develop an NMF-based mathematical model and corresponding computational algorithms to improve the solution identifiability of deconvoluting bulk RNA-seq data. In our approach, we combine the biological concept of marker genes with the solvability conditions of the NMF theories, and develop a geometric structures guided optimization model. In this strategy, the geometric structure of bulk tissue data is first explored by the spectral clustering technique. Then, the identified information of marker genes is integrated as solvability constraints, while the overall correlation graph is used as manifold regularization. Both synthetic and biological data are used to validate the proposed model and algorithms, from which solution interpretability and accuracy are significantly improved.</p>
Publisher
American Institute of Mathematical Sciences (AIMS)
Reference61 articles.
1. A. R. Abbas, K. Wolslegel, D. Seshasayee, Z. Modrusan and H. F. Clark, Deconvolution of blood microarray data identifies cellular activation patterns in systemic lupus erythematosus, PloS One, 4 (2009), e6098. 2. M. Allen, M. M. Carrasquillo, C. Funk, B. D. Heavner, F. Zou, C. S. Younkin, J. D. Burgess, H. -S. Chai, J. Crook, J. A. Eddy, et al., Human whole genome genotype and transcriptome data for Alzheimer's and other neurodegenerative diseases, Scientific Data, 3 (2016), 160089. 3. M. Allen, X. Wang, J. D. Burgess, J. Watzlawik, D. J. Serie, C. S. Younkin, T. Nguyen, K. G. Malphrus, S. Lincoln, M. M. Carrasquillo, et al., Conserved brain myelination networks are altered in Alzheimer's and other neurodegenerative diseases, Alzheimer's & Dementia., 14 (2018), 352-366. 4. F. Avila Cobos, J. Vandesompele, P. Mestdagh, K. De Preter.Computational deconvolution of transcriptomics data from mixed cell populations, Bioinformatics, 34 (2018), 1969-1979. 5. M. Belkin and P. Niyogi, Laplacian eigenmaps and spectral techniques for embedding and clustering, in Advances in Neural Information Processing Systems, (2002), 585–591.
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
|
|