Abstract
AbstractPlant Disease Resistance (PDR) proteins are critical in identifying and killing plant pathogens. Predicting PDR protein is essential for understanding plant-pathogen interactions and developing strategies for crop protection. This study proposes a hybrid model for predicting and designing PDR proteins against plant-invading pathogens. Initially, we tried alignment-based approaches, such as BLAST for similarity search and MERCI for motif search. These alignment-based approaches exhibit very poor coverage or sensitivity. To overcome these limitations, we developed alignment-free or machine learning-based methods using compositional features of proteins. Our machine learning-based model, developed using compositional features of proteins, achieved a maximum performance AUROC of 0.92. The performance of our model improved significantly from AUROC of 0.92 to 0.95 when we used evolutionary information instead of protein sequence. Finally, we developed a hybrid or ensemble model that combined our best machine learning model with BLAST and obtained the highest AUROC of 0.98 on the validation dataset. We trained and tested our models on a training dataset and evaluated them on a validation dataset. None of the proteins in our validation dataset are more than 40% similar to proteins in the training dataset. One of the objectives of this study is to facilitate the scientific community working in plant biology. Thus, we developed an online platform for predicting and designing plant resistance proteins, “PlantDRPpred” (https://webs.iiitd.edu.in/raghava/plantdrppred).HighlightsDevelopment of a Machine-learning model for resistance protein prediction.Used alignment-based and alignment-free ensemble methods.Web server development and standalone package.Prediction and design of PDR proteins.
Publisher
Cold Spring Harbor Laboratory