Abstract
New sublineages of SARS-CoV-2 variants-of-concern (VOCs) continuously emerge with mutations in the spike glycoprotein. In most cases, the sublineage-defining mutations vary between the VOCs. It is unclear whether these differences reflect lineage-specific likelihoods for mutations at each spike position or the stochastic nature of their appearance. Here we show that SARS-CoV-2 lineages have distinct evolutionary spaces (a probabilistic definition of the sequence states that can be occupied by expanding virus subpopulations). This space can be accurately inferred from the patterns of amino acid variability at the whole-protein level. Robust networks of co-variable sites identify the highest-likelihood mutations in new VOC sublineages and predict remarkably well the emergence of subvariants with resistance mutations to COVID-19 therapeutics. Our studies reveal the contribution of low frequency variant patterns at heterologous sites across the protein to accurate prediction of the changes at each position of interest.
Funder
amfAR, The Foundation for AIDS Research
Division of Microbiology and Infectious Diseases, National Institute of Allergy and Infectious Diseases
Publisher
Public Library of Science (PLoS)