Author:
Duan Tao,Kuang Zhufang,Wang Jiaqi,Ma Zhihao
Abstract
In recent years, the long noncoding RNA (lncRNA) has been shown to be involved in many disease processes. The prediction of the lncRNA–disease association is helpful to clarify the mechanism of disease occurrence and bring some new methods of disease prevention and treatment. The current methods for predicting the potential lncRNA–disease association seldom consider the heterogeneous networks with complex node paths, and these methods have the problem of unbalanced positive and negative samples. To solve this problem, a method based on the Gradient Boosting Decision Tree (GBDT) and logistic regression (LR) to predict the lncRNA–disease association (GBDTLRL2D) is proposed in this paper. MetaGraph2Vec is used for feature learning, and negative sample sets are selected by using K-means clustering. The innovation of the GBDTLRL2D is that the clustering algorithm is used to select a representative negative sample set, and the use of MetaGraph2Vec can better retain the semantic and structural features in heterogeneous networks. The average area under the receiver operating characteristic curve (AUC) values of GBDTLRL2D obtained on the three datasets are 0.98, 0.98, and 0.96 in 10-fold cross-validation.
Funder
Foundation for Innovative Research Groups of the National Natural Science Foundation of China
Subject
Cell Biology,Developmental Biology
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献