Disease gene prediction with privileged information and heteroscedastic dropout-Reference-Cited by-同舟云学术

Disease gene prediction with privileged information and heteroscedastic dropout

Published:2021-07-01 Issue:Supplement_1 Volume:37 Page:i410-i417
ISSN:1367-4803
Container-title:Bioinformatics
language:en
Short-container-title:

Author:

Shu Juan¹,Li Yu²,Wang Sheng³,Xi Bowei¹,Ma Jianzhu⁴

Affiliation:

1. Department of Statistics, Purdue University, West Lafayette, IN 47906, USA

2. Department of Computer Science and Engineering, The Chinese University of HongKong, HongKong 999077, China

3. Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA 98195, USA

4. Institute for Artificial Intelligence, Peking University, Beijing 100871, China

Abstract

Abstract Motivation Recently, machine learning models have achieved tremendous success in prioritizing candidate genes for genetic diseases. These models are able to accurately quantify the similarity among disease and genes based on the intuition that similar genes are more likely to be associated with similar diseases. However, the genetic features these methods rely on are often hard to collect due to high experimental cost and various other technical limitations. Existing solutions of this problem significantly increase the risk of overfitting and decrease the generalizability of the models. Results In this work, we propose a graph neural network (GNN) version of the Learning under Privileged Information paradigm to predict new disease gene associations. Unlike previous gene prioritization approaches, our model does not require the genetic features to be the same at training and test stages. If a genetic feature is hard to measure and therefore missing at the test stage, our model could still efficiently incorporate its information during the training process. To implement this, we develop a Heteroscedastic Gaussian Dropout algorithm, where the dropout probability of the GNN model is determined by another GNN model with a mirrored GNN architecture. To evaluate our method, we compared our method with four state-of-the-art methods on the Online Mendelian Inheritance in Man dataset to prioritize candidate disease genes. Extensive evaluations show that our model could improve the prediction accuracy when all the features are available compared to other methods. More importantly, our model could make very accurate predictions when >90% of the features are missing at the test stage. Availability and implementation Our method is realized with Python 3.7 and Pytorch 1.5.0 and method and data are freely available at: https://github.com/juanshu30/Disease-Gene-Prioritization-with-Privileged-Information-and-Heteroscedastic-Dropout.

Publisher

Oxford University Press (OUP)

Subject

Computational Mathematics,Computational Theory and Mathematics,Computer Science Applications,Molecular Biology,Biochemistry,Statistics and Probability

Link

http://academic.oup.com/bioinformatics/article-pdf/37/Supplement_1/i410/39620054/btab310.pdf

Reference71 articles.

1. Speeding disease gene discovery by sequence based candidate prioritization;Adie;BMC Bioinformatics,2005

2. Gene prioritization through genomic data fusion;Aerts;Nat. Biotechnol,2006

3. GPSy: a cross-species gene prioritization system for conserved biological processes—application in male gamete development;Britto;Nucleic Acids Res,2012

4. Spectral networks and locally connected networks on graphs;Bruna;arXiv [cs.LG].,2013

5. Improved human disease candidate gene prioritization using mouse phenotype;Chen;BMC Bioinformatics,2007

Cited by 17 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Learning using privileged information with logistic regression on acute respiratory distress syndrome detection;Artificial Intelligence in Medicine;2024-10

2. Heterogeneous biomedical entity representation learning for gene–disease association prediction;Briefings in Bioinformatics;2024-07-25

3. Comprehensive Analysis of the Function and Prognostic Value of TAS2Rs Family-Related Genes in Colon Cancer;International Journal of Molecular Sciences;2024-06-21

4. Tissue specific tumor-gene link prediction through sampling based GNN using a heterogeneous network;Medical & Biological Engineering & Computing;2024-04-18

5. Predicting cell-type specific disease genes of diabetes with the biological network;Computers in Biology and Medicine;2024-02