Gene selection by incorporating genetic networks into case-control association studies

Author:

Cao XueweiORCID,Liang XiaoyuORCID,Zhang ShuanglinORCID,Sha QiuyingORCID

Abstract

AbstractLarge-scale genome-wide association studies (GWAS) have been successfully applied to a wide range of genetic variants underlying complex diseases. The network-based penalized regression approach has been developed to overcome the challenges caused by the computational efficiency for analyzing high-dimensional genomic data by incorporating a biological genetic network. In this paper, we propose a gene selection approach by incorporating genetic networks into case-control association studies for DNA sequence data or DNA methylation data. Instead of using traditional dimension reduction techniques such as principal component analyses and supervised principal component analyses, we use a linear combination of genotypes at SNPs or methylation values at CpG sites in each gene to capture gene-level signals. We develop three approaches for the linear combination: optimally weighted sum (OWS), LD-adjusted polygenic risk score (LD-PRS), and beta-based weighted sum (BWS). OWS and LD-PRS are supervised approaches that depend on the effect of each SNP or CpG site on the case-control status, while BWS can be extracted without using the case-control status. After using one of the linear combinations of genotypes or methylation values in each gene to capture gene-level signals, we regularize them to perform gene selection based on the biological network. Simulation studies show that the proposed approaches have higher true positive rates than using traditional dimension reduction techniques. We also apply our approaches to DNA methylation data and UK Biobank DNA sequence data for analyzing rheumatoid arthritis. The results show that the proposed methods can select potentially rheumatoid arthritis related genes that are missed by existing methods.Author SummaryThere is strong evidence showing that when genes are functionally related to each other in a genetic network, statistical methods utilizing prior biological network knowledge can outperform other methods that ignore genetic network structures. Therefore, statistical methods that can incorporate genetic network information into association analysis in human genetic association studies have been widely used since 2008. Here, we take advantage of recently developed methods to capture the gene-level signals in network-based penalized regression of high-dimensional genetic data. We have shown that the selection performance of our proposed methods can outperform three traditional principal component-based dimension reduction techniques in several simulation scenarios in terms of true positive rates. Meanwhile, by applying our methods in both DNA methylation data and DNA sequence data, the genes identified by our proposed methods can be significantly enriched into the rheumatoid arthritis pathway, such as genesHLA-DMA,HLA-DPB1, andHLA-DQA2in the HLA region.

Publisher

Cold Spring Harbor Laboratory

Reference43 articles.

1. Large-scale analysis of genetic and clinical patient data;Annual Review of Biomedical Data Science,2018

2. A regression framework to uncover pleiotropy in large-scale electronic health record data;Journal of the American Medical Informatics Association,2019

3. Large-Scale Identification, Mapping, and Genotyping of Single-Nucleotide Polymorphisms in the Human Genome

4. Analysing and interpreting DNA methylation data

5. Evaluation of the lasso and the elastic net in genome-wide association studies;Frontiers in genetics,2013

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3