Group sparse canonical correlation analysis for genomic data integration-Reference-Cited by-同舟云学术

Group sparse canonical correlation analysis for genomic data integration

Published:2013-08-12 Issue:1 Volume:14 Page:
ISSN:1471-2105
Container-title:BMC Bioinformatics
language:en
Short-container-title:BMC Bioinformatics

Author:

Lin Dongdong,Zhang Jigang,Li Jingyao,Calhoun Vince D,Deng Hong-Wen,Wang Yu-Ping

Abstract

Abstract Background The emergence of high-throughput genomic datasets from different sources and platforms (e.g., gene expression, single nucleotide polymorphisms (SNP), and copy number variation (CNV)) has greatly enhanced our understandings of the interplay of these genomic factors as well as their influences on the complex diseases. It is challenging to explore the relationship between these different types of genomic data sets. In this paper, we focus on a multivariate statistical method, canonical correlation analysis (CCA) method for this problem. Conventional CCA method does not work effectively if the number of data samples is significantly less than that of biomarkers, which is a typical case for genomic data (e.g., SNPs). Sparse CCA (sCCA) methods were introduced to overcome such difficulty, mostly using penalizations with l-1 norm (CCA-l1) or the combination of l-1and l-2 norm (CCA-elastic net). However, they overlook the structural or group effect within genomic data in the analysis, which often exist and are important (e.g., SNPs spanning a gene interact and work together as a group). Results We propose a new group sparse CCA method (CCA-sparse group) along with an effective numerical algorithm to study the mutual relationship between two different types of genomic data (i.e., SNP and gene expression). We then extend the model to a more general formulation that can include the existing sCCA models. We apply the model to feature/variable selection from two data sets and compare our group sparse CCA method with existing sCCA methods on both simulation and two real datasets (human gliomas data and NCI60 data). We use a graphical representation of the samples with a pair of canonical variates to demonstrate the discriminating characteristic of the selected features. Pathway analysis is further performed for biological interpretation of those features. Conclusions The CCA-sparse group method incorporates group effects of features into the correlation analysis while performs individual feature selection simultaneously. It outperforms the two sCCA methods (CCA-l1 and CCA-group) by identifying the correlated features with more true positives while controlling total discordance at a lower level on the simulated data, even if the group effect does not exist or there are irrelevant features grouped with true correlated features. Compared with our proposed CCA-group sparse models, CCA-l1 tends to select less true correlated features while CCA-group inclines to select more redundant features.

Publisher

Springer Science and Business Media LLC

Subject

Applied Mathematics,Computer Science Applications,Molecular Biology,Biochemistry,Structural Biology

Link

https://link.springer.com/content/pdf/10.1186/1471-2105-14-245.pdf

Reference49 articles.

1. Hamid JS, et al: Data integration in genetics and genomics: methods and challenges. Proteomics Hum Genomics. 2009, 2009:

2. Le Cao KA, et al: Sparse canonical methods for biological data integration: application to a cross-platform study. Bmc Bioinform. 2009, 10: 34-10.1186/1471-2105-10-34.

3. Wiley HS: Integrating multiple types of data for signaling research: challenges and opportunities. Sci Signal. 2011, 4 (160): pe9-10.1126/scisignal.2001826.

4. Le Cao KA, et al: A sparse PLS for variable selection when integrating omics data. Stat Appl Genet Mol Biol. 2008, 7: 35-

5. Hotelling H: Relations between two sets of variates. Biometrika. 1936, 28: 321-377.

Cited by 81 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Identifying genes associated with disease outcomes using joint sparse canonical correlation analysis—An application in renal clear cell carcinoma;Genetic Epidemiology;2024-05-15

2. Subspace Newton method for sparse group $$\ell _0$$ optimization problem;Journal of Global Optimization;2024-04-29

3. Root-associated bacterial communities and root metabolite composition are linked to nitrogen use efficiency in sorghum;mSystems;2024-01-23

4. Imaging Genetics;Medical Image Analysis;2024

5. Predicting Microbe-Metabolite Interactions by Integrating Non-negative Matrix Factorization and Generative Network;2023 IEEE International Conference on Bioinformatics and Biomedicine (BIBM);2023-12-05