Abstract
Abstract
Background
Influenza B virus can cause epidemics with high pathogenicity, so it poses a serious threat to public health. A feature representation algorithm is proposed in this paper to identify the pathogenicity phenotype of influenza B virus.
Methods
The dataset included all 11 influenza virus proteins encoded in eight genome segments of 1724 strains. Two types of features were hierarchically used to build the prediction model. Amino acid features were directly delivered from 67 feature descriptors and input into the random forest classifier to output informative features about the class label and probabilistic prediction. The sequential forward search strategy was used to optimize the informative features. The final features for each strain had low dimensions and included knowledge from different perspectives, which were used to build the machine learning model for pathogenicity identification.
Results
The 40 signature positions were achieved by entropy screening. Mutations at position 135 of the hemagglutinin protein had the highest entropy value (1.06). After the informative features were directly generated from the 67 random forest models, the dimensions for class and probabilistic features were optimized as 4 and 3, respectively. The optimal class features had a maximum accuracy of 94.2% and a maximum Matthews correlation coefficient of 88.4%, while the optimal probabilistic features had a maximum accuracy of 94.1% and a maximum Matthews correlation coefficient of 88.2%. The optimized features outperformed the original informative features and amino acid features from individual descriptors. The sequential forward search strategy had better performance than the classical ensemble method.
Conclusions
The optimized informative features had the best performance and were used to build a predictive model so as to identify the phenotype of influenza B virus with high pathogenicity and provide early risk warning for disease control.
Graphical Abstract
Funder
National Natural Science Foundation of China
Publisher
Springer Science and Business Media LLC
Subject
Infectious Diseases,Public Health, Environmental and Occupational Health,General Medicine
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献