Prediction of HIV-1 protease cleavage site from octapeptide sequence information using selected classifiers and hybrid descriptors

Author:

Onah Emmanuel,Uzor Philip F.,Ugwoke Ikenna Calvin,Eze Jude Uche,Ugwuanyi Sunday Tochukwu,Chukwudi Ifeanyi Richard,Ibezim Akachukwu

Abstract

Abstract Background In most parts of the world, especially in underdeveloped countries, acquired immunodeficiency syndrome (AIDS) still remains a major cause of death, disability, and unfavorable economic outcomes. This has necessitated intensive research to develop effective therapeutic agents for the treatment of human immunodeficiency virus (HIV) infection, which is responsible for AIDS. Peptide cleavage by HIV-1 protease is an essential step in the replication of HIV-1. Thus, correct and timely prediction of the cleavage site of HIV-1 protease can significantly speed up and optimize the drug discovery process of novel HIV-1 protease inhibitors. In this work, we built and compared the performance of selected machine learning models for the prediction of HIV-1 protease cleavage site utilizing a hybrid of octapeptide sequence information comprising bond composition, amino acid binary profile (AABP), and physicochemical properties as numerical descriptors serving as input variables for some selected machine learning algorithms. Our work differs from antecedent studies exploring the same subject in the combination of octapeptide descriptors and method used. Instead of using various subsets of the dataset for training and testing the models, we combined the dataset, applied a 3-way data split, and then used a "stratified" 10-fold cross-validation technique alongside the testing set to evaluate the models. Results Among the 8 models evaluated in the “stratified” 10-fold CV experiment, logistic regression, multi-layer perceptron classifier, linear discriminant analysis, gradient boosting classifier, Naive Bayes classifier, and decision tree classifier with AUC, F-score, and B. Acc. scores in the ranges of 0.91–0.96, 0.81–0.88, and 80.1–86.4%, respectively, have the closest predictive performance to the state-of-the-art model (AUC 0.96, F-score 0.80 and B. Acc. ~ 80.0%). Whereas, the perceptron classifier and the K-nearest neighbors had statistically lower performance (AUC 0.77–0.82, F-score 0.53–0.69, and B. Acc. 60.0–68.5%) at p < 0.05. On the other hand, logistic regression, and multi-layer perceptron classifier (AUC of 0.97, F-score > 0.89, and B. Acc. > 90.0%) had the best performance on further evaluation on the testing set, though linear discriminant analysis, gradient boosting classifier, and Naive Bayes classifier equally performed well (AUC > 0.94, F-score > 0.87, and B. Acc. > 86.0%). Conclusions Logistic regression and multi-layer perceptron classifiers have comparable predictive performances to the state-of-the-art model when octapeptide sequence descriptors consisting of AABP, bond composition and standard physicochemical properties are used as input variables. In our future work, we hope to develop a standalone software for HIV-1 protease cleavage site prediction utilizing the linear regression algorithm and the aforementioned octapeptide sequence descriptors.

Publisher

Springer Science and Business Media LLC

Subject

Applied Mathematics,Computer Science Applications,Molecular Biology,Biochemistry,Structural Biology

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3