Abstract
High-throughput data such as metabolomics, genomics, transcriptomics, and proteomics have become familiar data types within the “-omics” family. For this work, we focus on subsets that interact with one another and represent these “pathways” as graphs. Observed pathways often have disjoint components, i.e., nodes or sets of nodes (metabolites, etc.) not connected to any other within the pathway, which notably lessens testing power. In this paper we propose the Pathway Integrated Regression-based Kernel Association Test (PaIRKAT), a new kernel machine regression method for incorporating known pathway information into the semi-parametric kernel regression framework. This work extends previous kernel machine approaches. This paper also contributes an application of a graph kernel regularization method for overcoming disconnected pathways. By incorporating a regularized or “smoothed” graph into a score test, PaIRKAT can provide more powerful tests for associations between biological pathways and phenotypes of interest and will be helpful in identifying novel pathways for targeted clinical research. We evaluate this method through several simulation studies and an application to real metabolomics data from the COPDGene study. Our simulation studies illustrate the robustness of this method to incorrect and incomplete pathway knowledge, and the real data analysis shows meaningful improvements of testing power in pathways. PaIRKAT was developed for application to metabolomic pathway data, but the techniques are easily generalizable to other data sources with a graph-like structure.
Funder
National Heart, Lung, and Blood Institute
Division of Cancer Epidemiology and Genetics, National Cancer Institute
Publisher
Public Library of Science (PLoS)
Subject
Computational Theory and Mathematics,Cellular and Molecular Neuroscience,Genetics,Molecular Biology,Ecology,Modeling and Simulation,Ecology, Evolution, Behavior and Systematics
Reference67 articles.
1. Metabolomics — the link between genotypes and phenotypes
2. Analytical Methods in Untargeted Metabolomics: State of the Art in 2015;A Alonso;Front Bioeng Biotechnol.,2015
3. Stronger findings from mass spectral data through multi-peak modeling;T Suvitaival;BMC Bioinformatics,2014
4. Stronger findings for metabolomics through Bayesian modeling of multiple peaks and compound correlations;T Suvitaival;Bioinformatics,2014
5. Kernel approaches for differential expression analysis of mass spectrometry-based metabolomics data;X Zhan;BMC Bioinformatics,2015
Cited by
4 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献