Author:
Biswas Sourav,Saha Suparna,Bandyopadhyay Sanghamitra,Bhattacharyya Malay
Abstract
AbstractWith an increasing number of SARS-CoV-2 sequences available day by day, new genomic information is getting revealed to us. As SARS-CoV-2 sequences highlight wide changes across the samples, we aim to explore whether these changes reveal the geographical origin of the corresponding samples. The k-mer distributions, denoting normalized frequency counts of all possible combinations of nucleotide of size upto k, are often helpful to explore sequence level patterns. Given the SARS-CoV-2 sequences are highly imbalanced by its geographical origin (relatively with a higher number samples collected from the USA), we observe that with proper under-sampling k-mer distributions in the SARS-CoV-2 sequences predict its geographical origin with more than 90% accuracy. The experiments are performed on the samples collected from six countries with maximum number of sequences available till July 07, 2020. This comprises SARS-CoV-2 sequences from Australia, USA, China, India, Greece and France. Moreover, we demonstrate that the changes of genomic sequences characterize the continents as a whole. We also highlight that the network motifs present in the sequence similarity networks have a significant difference across the said countries. This, as a whole, is capable of predicting the geographical shift of SARS-CoV-2.
Publisher
Cold Spring Harbor Laboratory
Reference33 articles.
1. A novel coronavirus genome identified in a cluster of pneumonia cases—wuhan, china 2019-2020;China CDC Weekly,2020
2. Na Zhu , Dingyu Zhang , Wenling Wang , Xingwang Li , Bo Yang , Jingdong Song , Xiang Zhao , Baoying Huang , Weifeng Shi , Roujian Lu , et al. A novel coronavirus from patients with pneumonia in china, 2019. New England Journal of Medicine, 2020.
3. Genome-wide analysis of sars-cov-2 virus strains circulating worldwide implicates heterogeneity;Scientific reports,2020
4. Xiaolu Tang , Changcheng Wu , Xiang Li , Yuhe Song , Xinmin Yao , Xinkai Wu , Yuange Duan , Hong Zhang , Yirong Wang , Zhaohui Qian , et al. On the origin and continuing evolution of sars-cov-2. National Science Review, 7(6).
5. Anthony R Fehr and Stanley Perlman . Coronaviruses: an overview of their replication and pathogenesis. In Coronaviruses, pages 1–23. Springer, 2015.