Abstract
AbstractMotivationProtein-protein Interaction (PPI) networks are crucial for automatically annotating protein functions. As there are different types of evidence to define PPI networks, multiple PPI networks exist for the same set of proteins to capture their properties from different aspects, creating challenges in effectively utilizing these heterogeneous graphs for protein function prediction. Recently, several deep learning models have combined PPI networks from all evidence, or concatenated all graph embeddings. However, the lack of a delicate selection procedure prevents the effective harness of information from different PPI networks as they vary in densities, structures and noise levels. Consequently, combining protein features indiscriminately could increase the noise level, leading to decreased model performance.ResultsWe develop DualNetGO, a dual network model comprised of a classifier and a selector, to predict protein functions by effectively selecting features from different sources including graph embeddings of PPI networks, protein domain and subcellular location information. Evaluation of DualNetGO on human and mouse datasets in comparison with other network-based models show at least 4.5%, 6.2% and 14.2% improvement on Fmax in BP, MF and CC Gene Ontology categories respectively for human, and 3.3%, 10.6% and 7.7% improvement on Fmax for mouse. We further show that our model is insensitive to the choice of graph embedding method and is time- and memory-saving. These results demonstrate that combining a subset of features including PPI networks and protein attributes selected by our model is more effective in utilizing PPI network information than only using one kind of or concatenating graph embeddings from all kinds of PPI networks.Availability and implementationThe source code of DualNetGO and some of the experiment data are available at:https://github.com/georgedashen/DualNetGO.
Publisher
Cold Spring Harbor Laboratory