Author:
Li Jing,Zhao Zhongpeng,Tai ChengZheng,Sun Ting,Tan Lingyun,Li Xinyu,He Wei,Li HongJun,Zhang Jing
Abstract
AbstractBackgroundThe viruses threats provoke concerns regarding their sustained epidemic transmission, making the development of vaccines particularly important. In the prolonged and costly process of vaccine development, the most important initial step is to identify protective immunogens. Machine learning (ML) approaches are productive in analyzing big data such as microbial proteomes, and can remarkably reduce the cost of experimental work in developing novel vaccine candidates.ResultsWe intensively evaluated the immunogenicity prediction power of eight commonly-used ML methods by random sampling cross validation on a large dataset consisting of known viral immunogens and non-immunogens we manually curated from the public domain. XGBoost, kNN and RF showed the strongest predictive power. We then proposed a novel soft-voting based ensemble approach (VirusImmu), which demonstrated a powerful and stable capability for viral immunogenicity prediction across the test set and external test set irrespective of protein sequence length. VirusImmu was successfully applied to facilitate identifying linear B cell epitopes against African Swine Fever Virus as confirmed by indirect ELISA in vitro.ConclusionsVirusImmu exhibited tremendous potentials in predicting immunogenicity of viral protein segments. It is freely accessible athttps://github.com/zhangjbig/VirusImmu.
Publisher
Cold Spring Harbor Laboratory