MitoHiFi: a python pipeline for mitochondrial genome assembly from PacBio high fidelity reads
-
Published:2023-07-18
Issue:1
Volume:24
Page:
-
ISSN:1471-2105
-
Container-title:BMC Bioinformatics
-
language:en
-
Short-container-title:BMC Bioinformatics
Author:
Uliano-Silva MarcelaORCID, Ferreira João Gabriel R. N.ORCID, Krasheninnikova Ksenia, Blaxter Mark, Mieszkowska Nova, Hall Neil, Holland Peter, Durbin Richard, Richards Thomas, Kersey Paul, Hollingsworth Peter, Wilson Willie, Twyford Alex, Gaya Ester, Lawniczak Mara, Lewis Owen, Broad Gavin, Martin Fergal, Hart Michelle, Barnes Ian, Formenti GiulioORCID, Abueg LinelleORCID, Torrance JamesORCID, Myers Eugene W.ORCID, Durbin RichardORCID, Blaxter MarkORCID, McCarthy Shane A.ORCID,
Abstract
Abstract
Background
PacBio high fidelity (HiFi) sequencing reads are both long (15–20 kb) and highly accurate (> Q20). Because of these properties, they have revolutionised genome assembly leading to more accurate and contiguous genomes. In eukaryotes the mitochondrial genome is sequenced alongside the nuclear genome often at very high coverage. A dedicated tool for mitochondrial genome assembly using HiFi reads is still missing.
Results
MitoHiFi was developed within the Darwin Tree of Life Project to assemble mitochondrial genomes from the HiFi reads generated for target species. The input for MitoHiFi is either the raw reads or the assembled contigs, and the tool outputs a mitochondrial genome sequence fasta file along with annotation of protein and RNA genes. Variants arising from heteroplasmy are assembled independently, and nuclear insertions of mitochondrial sequences are identified and not used in organellar genome assembly. MitoHiFi has been used to assemble 374 mitochondrial genomes (368 Metazoa and 6 Fungi species) for the Darwin Tree of Life Project, the Vertebrate Genomes Project and the Aquatic Symbiosis Genome Project. Inspection of 60 mitochondrial genomes assembled with MitoHiFi for species that already have reference sequences in public databases showed the widespread presence of previously unreported repeats.
Conclusions
MitoHiFi is able to assemble mitochondrial genomes from a wide phylogenetic range of taxa from Pacbio HiFi data. MitoHiFi is written in python and is freely available on GitHub (https://github.com/marcelauliano/MitoHiFi). MitoHiFi is available with its dependencies as a Docker container on GitHub (ghcr.io/marcelauliano/mitohifi:master).
Funder
Wellcome Sanger Core Award Wellcome Trust Darwin Tree of Life Discretionary Award
Publisher
Springer Science and Business Media LLC
Subject
Applied Mathematics,Computer Science Applications,Molecular Biology,Biochemistry,Structural Biology
Reference30 articles.
1. Richards S. It’s more than stamp collecting: how genome sequencing can unify biological research. Trends Genet TIG. 2015;31:411–21. 2. Blaxter M, Archibald JM, Childers AK, Coddington JA, Crandall KA, Di Palma F, et al. Why sequence all eukaryotes? Proc Natl Acad Sci. 2022;119:2115636118. 3. Lewin HA, Robinson GE, Kress WJ, Baker WJ, Coddington J, Crandall KA, et al. Earth biogenome project: sequencing life for the future of life. Proc Natl Acad Sci USA. 2018;115:4325–33. 4. Blaxter M, Mieszkowska N, Di Palma F, Holland P, Durbin R, et al. Sequence locally, think globally: the darwin tree of life project. Proc Natl Acad Sci. 2022;119:e2115642118. 5. Rhie A, McCarthy SA, Fedrigo O, Damas J, Formenti G, Koren S, et al. Towards complete and error-free genome assemblies of all vertebrate species. Nature. 2021;592:737–46.
Cited by
530 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
|
|