Affiliation:
1. Department of Computer Science North Dakota State University, Fargo, ND 58106, USA
Abstract
The pandemic caused by SARS-CoV-2 has had a significant impact on the whole world. In a theory of the origin of SARS-CoV-2, pangolins are considered as a potential intermediate host. To assemble the genome of suspicious coronavirus (CoV) found in pangolins, SARS-CoV-2 was used as a reference in most of the previous studies, implicitly assuming the pangolin CoV and SARS-CoV-2 are the closest neighbors in evolution. However, this assumption may not be true. We investigated how the choice of reference genome affected the resulting CoV genome assembly. We explored various representative CoVs as the reference genome, and found significant differences in the resulting assemblies. The assembly obtained using RaTG13 as a reference showed better statistics in total length, N50, and pairwise distance reconstruction (PDR) scores than the assembly guided by SARS-CoV-2, indicating that RaTG13 may be a better reference. Therefore, RaTG13 should also be considered as a reference for assembling suspicious CoV found in pangolins and other potential intermediate hosts.
Publisher
World Scientific Pub Co Pte Lt
Subject
Computer Science Applications,Molecular Biology,Biochemistry