Towards complete and error-free genome assemblies of all vertebrate species
Author:
Rhie ArangORCID, McCarthy Shane A., Fedrigo OlivierORCID, Damas Joana, Formenti GiulioORCID, Koren SergeyORCID, Uliano-Silva MarcelaORCID, Chow William, Fungtammasan Arkarachai, Kim Juwan, Lee Chul, Ko Byung June, Chaisson Mark, Gedman Gregory L., Cantin Lindsey J., Thibaud-Nissen FrancoiseORCID, Haggerty Leanne, Bista Iliana, Smith Michelle, Haase Bettina, Mountcastle JacquelynORCID, Winkler SylkeORCID, Paez Sadye, Howard Jason, Vernes Sonja C.ORCID, Lama Tanya M., Grutzner FrankORCID, Warren Wesley C., Balakrishnan Christopher N.ORCID, Burt DaveORCID, George Julia M.ORCID, Biegler Matthew T., Iorns David, Digby AndrewORCID, Eason Daryl, Robertson BruceORCID, Edwards TaylorORCID, Wilkinson Mark, Turner George, Meyer AxelORCID, Kautt Andreas F.ORCID, Franchini PaoloORCID, Detrich H. WilliamORCID, Svardal Hannes, Wagner Maximilian, Naylor Gavin J. P., Pippel MartinORCID, Malinsky MilanORCID, Mooney Mark, Simbirsky Maria, Hannigan Brett T., Pesout Trevor, Houck Marlys, Misuraca Ann, Kingan Sarah B.ORCID, Hall RichardORCID, Kronenberg Zev, Sović Ivan, Dunn ChristopherORCID, Ning Zemin, Hastie Alex, Lee JoyceORCID, Selvaraj Siddarth, Green Richard E., Putnam Nicholas H.ORCID, Gut IvoORCID, Ghurye Jay, Garrison Erik, Sims Ying, Collins Joanna, Pelan Sarah, Torrance James, Tracey AlanORCID, Wood Jonathan, Dagnew Robel E., Guan DengfengORCID, London Sarah E.ORCID, Clayton David F.ORCID, Mello Claudio V.ORCID, Friedrich Samantha R.ORCID, Lovell Peter V., Osipova EkaterinaORCID, Al-Ajli Farooq O.ORCID, Secomandi SimonaORCID, Kim HeebalORCID, Theofanopoulou ConstantinaORCID, Hiller Michael, Zhou YangORCID, Harris Robert S., Makova Kateryna D., Medvedev Paul, Hoffman Jinna, Masterson Patrick, Clark Karen, Martin FergalORCID, Howe Kevin, Flicek PaulORCID, Walenz Brian P.ORCID, Kwak Woori, Clawson Hiram, Diekhans MarkORCID, Nassar Luis, Paten BenedictORCID, Kraus Robert H. S., Crawford Andrew J.ORCID, Gilbert M. Thomas P.ORCID, Zhang GuojieORCID, Venkatesh ByrappaORCID, Murphy Robert W., Koepfli Klaus-Peter, Shapiro BethORCID, Johnson Warren E.ORCID, Di Palma Federica, Marques-Bonet TomasORCID, Teeling Emma C.ORCID, Warnow Tandy, Graves Jennifer Marshall, Ryder Oliver A.ORCID, Haussler DavidORCID, O’Brien Stephen J., Korlach JonasORCID, Lewin Harris A.ORCID, Howe KerstinORCID, Myers Eugene W.ORCID, Durbin RichardORCID, Phillippy Adam M.ORCID, Jarvis Erich D.ORCID
Abstract
AbstractHigh-quality and complete reference genome assemblies are fundamental for the application of genomics to biology, disease, and biodiversity conservation. However, such assemblies are available for only a few non-microbial species1–4. To address this issue, the international Genome 10K (G10K) consortium5,6 has worked over a five-year period to evaluate and develop cost-effective methods for assembling highly accurate and nearly complete reference genomes. Here we present lessons learned from generating assemblies for 16 species that represent six major vertebrate lineages. We confirm that long-read sequencing technologies are essential for maximizing genome quality, and that unresolved complex repeats and haplotype heterozygosity are major sources of assembly error when not handled correctly. Our assemblies correct substantial errors, add missing sequence in some of the best historical reference genomes, and reveal biological discoveries. These include the identification of many false gene duplications, increases in gene sizes, chromosome rearrangements that are specific to lineages, a repeated independent chromosome breakpoint in bat genomes, and a canonical GC-rich pattern in protein-coding genes and their regulatory regions. Adopting these lessons, we have embarked on the Vertebrate Genomes Project (VGP), an international effort to generate high-quality, complete reference genomes for all of the roughly 70,000 extant vertebrate species and to help to enable a new era of discovery across the life sciences.
Publisher
Springer Science and Business Media LLC
Subject
Multidisciplinary
Cited by
1487 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
|
|