Affiliation:
1. School of Electronic and Information Engineering Suzhou University of Science and Technology Suzhou China
2. Department of Respiratory Medicine, The Affiliated Suzhou Hospital of Nanjing University Medical School Suzhou China
Abstract
AbstractStudying the association between microbes and diseases not only aids in the prevention and diagnosis of diseases, but also provides crucial theoretical support for new drug development and personalized treatment. Due to the time‐consuming and costly nature of laboratory‐based biological tests to confirm the relationship between microbes and diseases, there is an urgent need for innovative computational frameworks to anticipate new associations between microbes and diseases. Here, we propose a novel computational approach based on a dual branch graph convolutional network (GCN) module, abbreviated as DBGCNMDA, for identifying microbe–disease associations. First, DBGCNMDA calculates the similarity matrix of diseases and microbes by integrating functional similarity and Gaussian association spectrum kernel (GAPK) similarity. Then, semantic information from different biological networks is extracted by two GCN modules from different perspectives. Finally, the scores of microbe–disease associations are predicted based on the extracted features. The main innovation of this method lies in the use of two types of information for microbe/disease similarity assessment. Additionally, we extend the disease nodes to address the issue of insufficient features due to low data dimensionality. We optimize the connectivity between the homogeneous entities using random walk with restart (RWR), and then use the optimized similarity matrix as the initial feature matrix. In terms of network understanding, we design a dual branch GCN module, namely GlobalGCN and LocalGCN, to fine‐tune node representations by introducing side information, including homologous neighbour nodes. We evaluate the accuracy of the DBGCNMDA model using five‐fold cross‐validation (5‐fold‐CV) technique. The results show that the area under the receiver operating characteristic curve (AUC) and area under the precision versus recall curve (AUPR) of the DBGCNMDA model in the 5‐fold‐CV are 0.9559 and 0.9630, respectively. The results from the case studies using published experimental data confirm a significant number of predicted associations, indicating that DBGCNMDA is an effective tool for predicting potential microbe–disease associations.
Funder
National Natural Science Foundation of China