Affiliation:
1. Biology Department, University of Puerto Rico—Rio Piedras, San Juan, PR 00901, USA
Abstract
RaTG13 is phylogenomically the closest related coronavirus to SARS-CoV-2; consequently, understanding the provenance of this high-value genome sequence is important in understanding the origin of SARS-CoV-2. While RaTG13 was described as being generated from a Rhinolophus affinis fecal swab obtained from a mine in Mojiang, Yunnan, numerous investigators have pointed out that this is inconsistent with the low proportion of bacterial reads in the sequencing dataset. Metagenomic analysis confirms that only 10.3% of small-subunit (SSU) rRNA sequences in the dataset are bacterial, which is inconsistent with a fecal sample. In addition, the bacterial taxa present in the sample are shown to be inconsistent with fecal material. The assembly of mitochondrial SSU rRNA sequences in the dataset produces a sequence 98.7% identical to R. affinis mitochondrial SSU rRNA, indicating that the sample was generated from R. affinis or a closely related species. In addition, 87.5% of the reads in the dataset map to the Rhinolophus ferrumequinum genome, and 62.2% of these map to protein-coding genes, indicating that the dataset represents a Rhinolophus sp. transcriptome rather than a fecal swab sample. Differential gene expression analysis reveals that the pattern of expressed genes in the RaTG13 dataset is similar to that of RaTG15, which was also collected from the Mojiang mine. GO enrichment analysis reveals the overexpression of spermatogenesis- and olfaction-related genes in both datasets. This observation is consistent with a mating plug found in female Rhinolophid bats and suggests that RaTG13 was mis-sampled from such a plug. A validated natural provenance of the RaTG13 dataset throws into relief the unusual features of the SARS-CoV-2 genome.