Affiliation:
1. School of Communications and Electronics Jiangxi Science and Technology Normal University Nanchang 330003 China
Abstract
AbstractRecent studies reported that amyloid proteins keep a closely relationship with some common diseases, such as Alzhemier's disease, Parkinson's disease, and type 2 diabetes. In view of this, it is an urgent task to discriminate amyloid proteins from non‐amyloid proteins. In this work, we developed a new machine learning model to identify amyloid proteins based on the sequence information. Firstly, fifty different kinds of physicochemical (PC) properties were employed to denote sequences. Then, a sliding window approach was adopted to capture the local correlation information based on Pearson's correlation coefficient. And the Least Absolute Shrinkage and Selection Operator (LASSO) algorithm was used to select these most distinguishing features. Given that the number of negative samples larger than the number of positive samples, the popular synthetic minority oversampling technique (SMOTE) algorithm was utilized to solve the unbalanced dataset. Experiments were performed on support vector machine by using jackknife test. Compared with the existing predictors, experimental results showed that the proposed method has significantly improvement in distinguishing amyloid from non‐amyloid proteins. The dataset and codes used in this study were available at https://figshare.com/articles/online_resource/iAMY‐DC/20268093.
Funder
Jiangxi Science and Technology Normal University
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献