Numt Parser: Automated identification and removal of nuclear mitochondrial pseudogenes (numts) for accurate mitochondrial genome reconstruction in Panthera

Author:

de Flamingh Alida12ORCID,Rivera-Colón Angel G3ORCID,Gnoske Tom P4,Kerbis Peterhans Julian C456,Catchen Julian3ORCID,Malhi Ripan S127ORCID,Roca Alfred L128ORCID

Affiliation:

1. Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign (UIUC) , Urbana, IL , United States

2. Program in Ecology, Evolution and Conservation Biology, UIUC , Urbana, IL , United States

3. Department of Evolution, Ecology and Behavior, UIUC , Urbana, IL , United States

4. Field Museum of Natural History (FMNH) , Chicago, IL , United States

5. Science & Education, Field Museum of Natural History , Chicago, IL , United States

6. College of Arts & Science, Roosevelt University , Chicago, IL , United States

7. Department of Anthropology, UIUC , Urbana, IL , United States

8. Department of Animal Sciences, UIUC , Urbana, IL , United States

Abstract

Abstract Nuclear mitochondrial pseudogenes (numts) may hinder the reconstruction of mtDNA genomes and affect the reliability of mtDNA datasets for phylogenetic and population genetic comparisons. Here, we present the program Numt Parser, which allows for the identification of DNA sequences that likely originate from numt pseudogene DNA. Sequencing reads are classified as originating from either numt or true cytoplasmic mitochondrial (cymt) DNA by direct comparison against cymt and numt reference sequences. Classified reads can then be parsed into cymt or numt datasets. We tested this program using whole genome shotgun-sequenced data from 2 ancient Cape lions (Panthera leo), because mtDNA is often the marker of choice for ancient DNA studies and the genus Panthera is known to have numt pseudogenes. Numt Parser decreased sequence disagreements that were likely due to numt pseudogene contamination and equalized read coverage across the mitogenome by removing reads that likely originated from numts. We compared the efficacy of Numt Parser to 2 other bioinformatic approaches that can be used to account for numt contamination. We found that Numt Parser outperformed approaches that rely only on read alignment or Basic Local Alignment Search Tool (BLAST) properties, and was effective at identifying sequences that likely originated from numts while having minimal impacts on the recovery of cymt reads. Numt Parser therefore improves the reconstruction of true mitogenomes, allowing for more accurate and robust biological inferences.

Funder

USAID Wildlife TRAPS Project

UIUC

Cooperative State Research Education, and Extension Service

U.S. Department of Agriculture

Publisher

Oxford University Press (OUP)

Subject

Genetics (clinical),Genetics,Molecular Biology,Biotechnology

Reference57 articles.

1. The Old World sparrows (genus Passer) phylogeography and their relative abundance of nuclear mtDNA pseudogenes;Allende;J Mol Evol,2001

2. Discovery of a large number of previously unrecognized mitochondrial pseudogenes in fish genomes;Antunes;Genomics,2005

3. Long live the king: chromosome-level assembly of the lion (Panthera leo) using linked-read, Hi-C, and long-read data;Armstrong;BMC Biol,2020

4. Mitochondrial pseudogenes: evolution’s misplaced witnesses;Bensasson;Trends Ecol Evol,2001

5. Phylogeographic patterns in Africa and high resolution delineation of genetic clades in the lion (Panthera leo);Bertola;Sci Rep,2016

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3