Disease gene prediction with privileged information and heteroscedastic dropout

Author:

Shu Juan1,Li Yu2,Wang Sheng3,Xi Bowei1,Ma Jianzhu4

Affiliation:

1. Department of Statistics, Purdue University, West Lafayette, IN 47906, USA

2. Department of Computer Science and Engineering, The Chinese University of HongKong, HongKong 999077, China

3. Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA 98195, USA

4. Institute for Artificial Intelligence, Peking University, Beijing 100871, China

Abstract

Abstract Motivation Recently, machine learning models have achieved tremendous success in prioritizing candidate genes for genetic diseases. These models are able to accurately quantify the similarity among disease and genes based on the intuition that similar genes are more likely to be associated with similar diseases. However, the genetic features these methods rely on are often hard to collect due to high experimental cost and various other technical limitations. Existing solutions of this problem significantly increase the risk of overfitting and decrease the generalizability of the models. Results In this work, we propose a graph neural network (GNN) version of the Learning under Privileged Information paradigm to predict new disease gene associations. Unlike previous gene prioritization approaches, our model does not require the genetic features to be the same at training and test stages. If a genetic feature is hard to measure and therefore missing at the test stage, our model could still efficiently incorporate its information during the training process. To implement this, we develop a Heteroscedastic Gaussian Dropout algorithm, where the dropout probability of the GNN model is determined by another GNN model with a mirrored GNN architecture. To evaluate our method, we compared our method with four state-of-the-art methods on the Online Mendelian Inheritance in Man dataset to prioritize candidate disease genes. Extensive evaluations show that our model could improve the prediction accuracy when all the features are available compared to other methods. More importantly, our model could make very accurate predictions when >90% of the features are missing at the test stage. Availability and implementation Our method is realized with Python 3.7 and Pytorch 1.5.0 and method and data are freely available at: https://github.com/juanshu30/Disease-Gene-Prioritization-with-Privileged-Information-and-Heteroscedastic-Dropout.

Publisher

Oxford University Press (OUP)

Subject

Computational Mathematics,Computational Theory and Mathematics,Computer Science Applications,Molecular Biology,Biochemistry,Statistics and Probability

Reference71 articles.

1. Speeding disease gene discovery by sequence based candidate prioritization;Adie;BMC Bioinformatics,2005

2. Gene prioritization through genomic data fusion;Aerts;Nat. Biotechnol,2006

3. GPSy: a cross-species gene prioritization system for conserved biological processes—application in male gamete development;Britto;Nucleic Acids Res,2012

4. Spectral networks and locally connected networks on graphs;Bruna;arXiv [cs.LG].,2013

5. Improved human disease candidate gene prioritization using mouse phenotype;Chen;BMC Bioinformatics,2007

Cited by 17 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3