Abstract
AbstractPredicting the evolutionary patterns of emerging and endemic viruses is key for mitigating their spread in host populations. In particular, it is critical to rapidly identify mutations with the potential for immune escape or increased disease burden (variants of concern). Knowing which circulating mutations are such variants of concern can inform treatment or mitigation strategies such as alternative vaccines or targeted social distancing. A recent study proposed that variants of concern can be identified using two quantities extracted from protein language models, grammaticality and semantic change. These quantities are defined in analogy to concepts from natural language processing. Grammaticality is intended to be a measure of whether a variant viral protein is viable, and semantic change is intended to be a measure of potential for immune escape. Here, we systematically test this hypothesis, taking advantage of several high-throughput datasets that have become available, and also testing additional machine learning models for calculating the grammaticality metric. We find that grammaticality can be a measure of protein viability, though the more traditional metric ΔΔGappears to be more effective. By contrast, we do not find compelling evidence that semantic change is a useful tool for identifying immune escape mutations.
Publisher
Cold Spring Harbor Laboratory