Abstract
AbstractThe severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has evolved many high-risk variants, resulting in repeated COVID-19 waves of pandemic during the past years. Therefore, accurate early-warning of high-risk variants is vital for epidemic prevention and control. Here we construct a machine learning model to predict high-risk variants of SARS-CoV-2 by LightGBM algorithm based on several important haplotype network features. As demonstrated on a series of different retrospective testing datasets, our model achieves accurate prediction of all variants of concern (VOC) and most variants of interest (AUC=0.96). Prediction based on the latest sequences shows that the newly emerging lineage BA.5 has the highest risk score and spreads rapidly to become a major epidemic lineage in multiple countries, suggesting that BA.5 bears great potential to be a VOC. In sum, our machine learning model is capable to early predict high-risk variants soon after their emergence, thus greatly improving public health preparedness against the evolving virus.
Publisher
Cold Spring Harbor Laboratory
Cited by
3 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献