Sufficient principal component regression for pattern discovery in transcriptomic data

Author:

Ding Lei1,Zentner Gabriel E23ORCID,McDonald Daniel J4ORCID

Affiliation:

1. Department of Statistics, Indiana University , Bloomington, IN 47405, USA

2. Department of Biology, Indiana University , Bloomington, IN 47405, USA

3. Indiana University Melvin and Bren Simon Comprehensive Cancer Center , Indianapolis, IN 46202, USA

4. Department of Statistics, University of British Columbia , Vancouver, BC, Canada

Abstract

Abstract Motivation Methods for the global measurement of transcript abundance such as microarrays and RNA-Seq generate datasets in which the number of measured features far exceeds the number of observations. Extracting biologically meaningful and experimentally tractable insights from such data therefore requires high-dimensional prediction. Existing sparse linear approaches to this challenge have been stunningly successful, but some important issues remain. These methods can fail to select the correct features, predict poorly relative to non-sparse alternatives or ignore any unknown grouping structures for the features. Results We propose a method called SuffPCR that yields improved predictions in high-dimensional tasks including regression and classification, especially in the typical context of omics with correlated features. SuffPCR first estimates sparse principal components and then estimates a linear model on the recovered subspace. Because the estimated subspace is sparse in the features, the resulting predictions will depend on only a small subset of genes. SuffPCR works well on a variety of simulated and experimental transcriptomic data, performing nearly optimally when the model assumptions are satisfied. We also demonstrate near-optimal theoretical guarantees. Availability and implementation Code and raw data are freely available at https://github.com/dajmcdon/suffpcr. Package documentation may be viewed at https://dajmcdon.github.io/suffpcr. Contact daniel@stat.ubc.ca Supplementary information Supplementary data are available at Bioinformatics Advances online.

Funder

National Science Foundation

National Institutes of Health

National Sciences and Engineering Research Council of Canada

NSERC

Publisher

Oxford University Press (OUP)

Subject

Cell Biology,Developmental Biology,Embryology,Anatomy

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3