Abstract
AbstractNuclear inserts derived from mitochondrial DNA (Numts) encode valuable information. Being mostly non-functional, and accumulating mutations more slowly than mitochondrial sequence, they act like molecular fossils – they preserve information on the ancestral sequences of the mitochondrial DNA. In addition, changes to the Numt sequence since their insertion into the nuclear genome carry information about the nuclear phylogeny. These attributes cannot be reliably exploited if Numt sequence is confused with the mitochondrial genome (mtDNA). The analysis of mtDNA would be similarly compromised by any confusion, for example producing misleading results in DNA barcoding that used mtDNA sequence. We propose a method to distinguish Numts from mtDNA, without the need for comprehensive assembly of the nuclear genome or the physical separation of organelles and nuclei. It exploits the different biases of long and short-read sequencing. We find that short-read data yield mainly mtDNA sequences, whereas long-read sequencing strongly enriches for Numt sequences. We demonstrate the method using genome-skimming (coverage < 1x) data obtained on Illumina short-read and PacBio long-read technology from DNA extracted from six grasshopper individuals. The mitochondrial genome sequences were assembled from the short-read data despite the presence of Numts. The PacBio data contained a much higher proportion of Numt reads (over 16-fold), making us caution against the use of long-read methods for studies using mitochondrial loci. We obtained two estimates of the genomic proportion of Numts. Finally, we introduce “tangle plots”, a way of visualising Numt structural rearrangements and comparing them between samples.
Publisher
Cold Spring Harbor Laboratory