Abstract
AbstractThe global impact of Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) has led to considerable interest in detecting novel beneficial mutations and other genomic changes that may signal the development of variants of concern (VOCs). The ability to accurately detect these changes within individual patient samples is important in enabling early detection of VOCs.Such genomic scans for positive selection are best performed via comparison of empirical data to simulated data wherein evolutionary factors, including mutation and recombination rates, reproductive and infection dynamics, and purifying and background selection, can be carefully accounted for and parameterized. While there has been work to quantify these factors in SARS-CoV-2, they have yet to be integrated into a baseline model describing intra-host evolutionary dynamics. To construct such a baseline model, we develop a simulation framework that enables one to establish expectations for underlying levels and patterns of patient-level variation. By varying eight key parameters, we evaluated 12,096 different model-parameter combinations and compared them to existing empirical data. Of these, 592 models (∼5%) were plausible based on the resulting mean expected number of segregating variants. These plausible models shared several commonalities shedding light on intra-host SARS-CoV-2 evolutionary dynamics: severe infection bottlenecks, low levels of reproductive skew, and a distribution of fitness effects skewed towards strongly deleterious mutations. We also describe important areas of model uncertainty and highlight additional sequence data that may help to further refine a baseline model. This study lays the groundwork for the improved analysis of existing and future SARS-CoV-2 within-patient data.Significance StatementDespite its tremendous impact on human health, a comprehensive evolutionary baseline model has yet to be developed for studying the within-host population genomics of SARS-CoV-2. Importantly, such modeling would enable improved analysis and provide insights into the key evolutionary dynamics governing SARS-CoV-2 evolution. Given this need, we have here quantified a set of plausible baseline models via large-scale simulation. The commonly shared features of these relevant models - including severe infection bottlenecks, low levels of progeny skew, and a high rate of strongly deleterious mutations - lay the foundation for sophisticated analyses of SARS-CoV-2 evolution within patients using these baseline models.
Publisher
Cold Spring Harbor Laboratory