Automated assembly of high-quality diploid human reference genomes-Reference-Cited by-同舟云学术

Automated assembly of high-quality diploid human reference genomes

Published:2022-03-06 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Jarvis Erich D.^ORCID,Formenti Giulio^ORCID,Rhie Arang^ORCID,Guarracino Andrea^ORCID,Yang Chentao,Wood Jonathan,Tracey Alan,Thibaud-Nissen Francoise,Vollger Mitchell R.^ORCID,Porubsky David,Cheng Haoyu,Asri Mobin,Logsdon Glennis A.,Carnevali Paolo,Chaisson Mark J.P.,Chin Chen-Shan,Cody Sarah,Collins Joanna^ORCID,Ebert Peter^ORCID,Escalona Merly^ORCID,Fedrigo Olivier^ORCID,Fulton Robert S.,Fulton Lucinda L.,Garg Shilpa,Ghurye Jay,Granat Ana,Green Edward^ORCID,Hall Ira,Harvey William,Hasenfeld Patrick,Hastie Alex,Haukness Marina,Jaeger Erich B.,Jain Miten,Kirsche Melanie^ORCID,Kolmogorov Mikhail,Korbel Jan O.,Koren Sergey,Korlach Jonas,Lee Joyce,Li Daofeng^ORCID,Lindsay Tina,Lucas Julian,Luo Feng,Marschall Tobias,McDaniel Jennifer,Nie Fan,Olsen Hugh E.,Olson Nathan D.,Pesout Trevor^ORCID,Puiu Daniela,Regier Allison,Ruan Jue,Salzberg Steven L.,Sanders Ashley D.,Schatz Michael C.,Schmitt Anthony,Schneider Valerie A.,Selvaraj Siddarth,Shafin Kishwar^ORCID,Shumate Alaina,Stober Catherine,Torrance James,Wagner Justin,Wang Jianxin,Wenger Aaron,Xiao Chuanle,Zimin Aleksey V.,Zhang Guojie,Wang Ting,Li Heng^ORCID,Garrison Erik^ORCID,Haussler David,Zook Justin M.^ORCID,Eichler Evan E.^ORCID,Phillippy Adam M.,Paten Benedict,Howe Kerstin^ORCID,Miga Karen H.,

Abstract

AbstractThe current human reference genome, GRCh38, represents over 20 years of effort to generate a high-quality assembly, which has greatly benefited society1, 2. However, it still has many gaps and errors, and does not represent a biological human genome since it is a blend of multiple individuals3, 4. Recently, a high-quality telomere-to-telomere reference genome, CHM13, was generated with the latest long-read technologies, but it was derived from a hydatidiform mole cell line with a duplicate genome, and is thus nearly homozygous5. To address these limitations, the Human Pangenome Reference Consortium (HPRC) recently formed with the goal of creating a collection of high-quality, cost-effective, diploid genome assemblies for a pangenome reference that represents human genetic diversity6. Here, in our first scientific report, we determined which combination of current genome sequencing and automated assembly approaches yields the most complete, accurate, and cost-effective diploid genome assemblies with minimal manual curation. Approaches that used highly accurate long reads and parent-child data to sort haplotypes during assembly outperformed those that did not. Developing a combination of all the top performing methods, we generated our first high- quality diploid reference assembly, containing only ∼4 gaps (range 0-12) per chromosome, most within + 1% of CHM13’s length. Nearly 1/4th of protein coding genes have synonymous amino acid changes between haplotypes, and centromeric regions showed the highest density of variation. Our findings serve as a foundation for assembling near-complete diploid human genomes at the scale required for constructing a human pangenome reference that captures all genetic variation from single nucleotides to large structural rearrangements.

Publisher

Cold Spring Harbor Laboratory

Reference84 articles.

1. Initial sequencing and analysis of the human genome

2. Evaluation of GRCh38 and de novo haploid genome assemblies demonstrates the enduring quality of the reference assembly

3. Pan-genomics in the human genome era;Nat. Rev. Genet,2020

4. Long-read human genome sequencing and its applications;Nat. Rev. Genet,2020

5. The complete sequence of a human genome

Cited by 17 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Genomic structural variation: A complex but important driver of human evolution;American Journal of Biological Anthropology;2023-02-16

2. The Telomere-Telomerase System Is Detrimental to Health at High-Altitude;International Journal of Environmental Research and Public Health;2023-01-20

3. Scalable Nanopore sequencing of human genomes provides a comprehensive view of haplotype-resolved variation and methylation;2023-01-15

4. NPGREAT: assembly of human subtelomere regions with the use of ultralong nanopore reads and linked-reads;BMC Bioinformatics;2022-12-16

5. Comparing Genomic and Epigenomic Features across Species Using the WashU Comparative Epigenome Browser;2022-12-02