Author:
Xu Bo,Liu Yu,Yu Shuo,Wang Lei,Dong Jie,Lin Hongfei,Yang Zhihao,Wang Jian,Xia Feng
Abstract
Abstract
Background
Prediction of pathogenic genes is crucial for disease prevention, diagnosis, and treatment. But traditional genetic localization methods are often technique-difficulty and time-consuming. With the development of computer science, computational biology has gradually become one of the main methods for finding candidate pathogenic genes.
Methods
We propose a pathogenic genes prediction method based on network embedding which is called Multipath2vec. Firstly, we construct an heterogeneous network which is called GP−network. It is constructed based on three kinds of relationships between genes and phenotypes, including correlations between phenotypes, interactions between genes and known gene-phenotype pairs. Then in order to embedding the network better, we design the multi-path to guide random walk in GP−network. The multi-path includes multiple paths between genes and phenotypes which can capture complex structural information of heterogeneous network. Finally, we use the learned vector representation of each phenotype and protein to calculate the similarities and rank according to the similarities between candidate genes and the target phenotype.
Results
We implemented Multipath2vec and four baseline approaches (i.e., CATAPULT, PRINCE, Deepwalk and Metapath2vec) on many-genes gene-phenotype data, single-gene gene-phenotype data and whole gene-phenotype data. Experimental results show that Multipath2vec outperformed the state-of-the-art baselines in pathogenic genes prediction task.
Conclusions
We propose Multipath2vec that can be utilized to predict pathogenic genes and experimental results show the higher accuracy of pathogenic genes prediction.
Publisher
Springer Science and Business Media LLC
Subject
Genetics (clinical),Genetics
Reference38 articles.
1. Glazier AM, Nadeau JH, Aitman TJ. Finding genes that underlie complex traits. Science. 2002; 298(5602):2345–9.
2. Khan GM. Evolution of Artificial Neural Development - In Search of Learning Genes. Studies in Computational Intelligence, vol. 725. Gewerbestrasse 11,6330 Cham: Springer. https://doi.org/10.1007/978-3-319-67466-7.
3. Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, Devon K, Dewar K, Doyle M, Fitzhugh W. Initial sequencing and analysis of the human genome. Nature. 2001; 3(6822):346.
4. Krauthammer M, Kaufmann CA, Gilliam TC, Rzhetsky A. Molecular triangulation: Bridging linkage and molecular-network information for identifying candidate genes in alzheimer’s disease. Proc Natl Acad Sci USA. 2004; 101(42):15148–53.
5. Frayling TM, Timpson NJ, Weedon MN, Zeggini E, Freathy RM, Lindgren CM, Perry JR, Elliott KS, Lango H, Rayner NW. A common variant in the fto gene is associated with body mass index and predisposes to childhood and adult obesity. Science. 2007; 316(5826):889–94.
Cited by
15 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献