Abstract
AbstractMutations primarily in the Spike (S) gene resulted in the emergence of many SARS-CoV-2 variants like Alpha, Beta, Delta and Omicron variants. This has also caused a number of COVID-19 pandemic waves which have impacted human lives in different ways due to restriction measures put in place to curb the spread of the virus. In this study, evolutionary patterns found in SARS-CoV-2 sequences of samples collected from Zimbabwean COVID-19 patients were investigated. High coverage SARS-CoV-2 whole genome sequences were downloaded from the GISAID database along with the GISAID S gene reference sequence. Biopython, NumPy and Pandas Data Science packages were used to load, slice and clean whole genome sequences outputting a fasta file with approximate Spike (S) gene sequences. Alignment of sliced dataset with GISAID reference sequence was done using Jalview 2.11.1.3 to find exact sequences of SARS-CoV-2 S gene. Evidence of recombination signals was investigated using RDP 4.1 and pervasive selection in the S gene was investigated using FUBAR algorithm hosted on the Datamonkey webserver. Matplotlib and Seaborn Python packages were used for Data Visualisation. A plot of Bayes factor hypothesizing non-synonymous substitution being greater than synonymous substitution (β > α) in the S protein sites showed 3 peaks with evidence of strong divergence. These 3 diverging S protein sites were found to be D142G, D614G and P681R. No evidence of recombination was detected by 9 methods of RDP which use different approaches to detect recombination signals. This study is useful in guiding drug, vaccine and diagnostic innovations toward better control of the pandemic. Additionally, this study can guide other non-biological interventions as we better understand the changes in various viral characteristics driven by the observed evolutionary patterns.
Publisher
Cold Spring Harbor Laboratory