Author:
Yao Dengju,Deng Yuexiao,Zhan Xiaojuan,Zhan Xiaorong
Abstract
Abstract
Background
Many biological studies have shown that lncRNAs regulate the expression of epigenetically related genes. The study of lncRNAs has helped to deepen our understanding of the pathogenesis of complex diseases at the molecular level. Due to the large number of lncRNAs and the complex and time-consuming nature of biological experiments, applying computer techniques to predict potential lncRNA-disease associations is very effective. To explore information between complex network structures, existing methods rely mainly on lncRNA and disease information. Metapaths have been applied to network models as an effective method for exploring information in heterogeneous graphs. However, existing methods are dominated by lncRNAs or disease nodes and tend to ignore the paths provided by intermediate nodes.
Methods
We propose a deep learning model based on hierarchical graphical attention networks to predict unknown lncRNA-disease associations using multiple types of metapaths to extract features. We have named this model the MMHGAN. First, the model constructs a lncRNA-disease–miRNA heterogeneous graph based on known associations and two homogeneous graphs of lncRNAs and diseases. Second, for homogeneous graphs, the features of neighboring nodes are aggregated using a multihead attention mechanism. Third, for the heterogeneous graph, metapaths of different intermediate nodes are selected to construct subgraphs, and the importance of different types of metapaths is calculated and aggregated to obtain the final embedded features. Finally, the features are reconstructed using a fully connected layer to obtain the prediction results.
Results
We used a fivefold cross-validation method and obtained an average AUC value of 96.07% and an average AUPR value of 93.23%. Additionally, ablation experiments demonstrated the role of homogeneous graphs and different intermediate node path weights. In addition, we studied lung cancer, esophageal carcinoma, and breast cancer. Among the 15 lncRNAs associated with these diseases, 15, 12, and 14 lncRNAs were validated by the lncRNA Disease Database and the Lnc2Cancer Database, respectively.
Conclusion
We compared the MMHGAN model with six existing models with better performance, and the case study demonstrated that the model was effective in predicting the correlation between potential lncRNAs and diseases.
Funder
National Natural Science Foundation of China
Publisher
Springer Science and Business Media LLC
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献