Abstract
The public health system is extremely dependent on the use of vaccines to immunize the population from a series of infectious and dangerous diseases, preventing the system from collapsing and millions of people dying every year. However, to develop these vaccines and effectively monitor these diseases, it is necessary to use accurate diagnostic methods capable of identifying highly immunogenic regions within a given pathogenic protein. Existing experimental methods are expensive, time-consuming, and require arduous laboratory work, as they require the screening of a large number of potential candidate epitopes, making the methods extremely laborious, especially for application to larger microorganisms. In the last decades, researchers have developed in silico prediction methods, based on machine learning, to identify these markers, to drastically reduce the list of potential candidate epitopes for experimental tests, and, consequently, to reduce the laborious task associated with their mapping. Despite these efforts, the tools and methods still have low accuracy, slow diagnosis, and offline training. Thus, we develop a method to predict B-cell linear epitopes which are based on a Fuzzy-ARTMAP neural network architecture, called BepFAMN (B Epitope Prediction Fuzzy ARTMAP Artificial Neural Network). This was trained using a linear averaging scheme on 15 properties that include an amino acid ratio scale and a set of 14 physicochemical scales. The database used was obtained from the IEDB website, from which the amino acid sequences with the annotations of their positive and negative epitopes were taken. To train and validate the knowledge models, five-fold cross-validation and competition techniques were used. The BepiPred-2.0 database, an independent database, was used for the tests. In our experiment, the validation dataset reached sensitivity = 91.50%, specificity = 91.49%, accuracy = 91.49%, MCC = 0.83, and an area under the curve (AUC) ROC of approximately 0.9289. The result in the testing dataset achieves a significant improvement, with sensitivity = 81.87%, specificity = 74.75%, accuracy = 78.27%, MCC = 0.56, and AOC = 0.7831. These achieved values demonstrate that BepFAMN outperforms all other linear B-cell epitope prediction tools currently used. In addition, the architecture provides mechanisms for online training, which allow the user to find a new B-cell linear epitope, and to improve the model without need to re-train itself with the whole dataset. This fact contributes to a considerable reduction in the number of potential linear epitopes to be experimentally validated, reducing laboratory time and accelerating the development of diagnostic tests, vaccines, and immunotherapeutic approaches.
Subject
Electrical and Electronic Engineering,Biochemistry,Instrumentation,Atomic and Molecular Physics, and Optics,Analytical Chemistry