Evaluation of GRCh38 and de novo haploid genome assemblies demonstrates the enduring quality of the reference assembly

Author:

Schneider Valerie A.,Graves-Lindsay Tina,Howe Kerstin,Bouk Nathan,Chen Hsiu-Chuan,Kitts Paul A.,Murphy Terence D.,Pruitt Kim D.,Thibaud-Nissen Françoise,Albracht Derek,Fulton Robert S.,Kremitzki Milinn,Magrini Vincent,Markovic Chris,McGrath Sean,Steinberg Karyn Meltz,Auger Kate,Chow William,Collins Joanna,Harden Glenn,Hubbard Timothy,Pelan Sarah,Simpson Jared T.,Threadgold Glen,Torrance James,Wood Jonathan M.,Clarke Laura,Koren Sergey,Boitano Matthew,Peluso Paul,Li Heng,Chin Chen-Shan,Phillippy Adam M.,Durbin Richard,Wilson Richard K.,Flicek Paul,Eichler Evan E.,Church Deanna M.

Abstract

The human reference genome assembly plays a central role in nearly all aspects of today's basic and clinical research. GRCh38 is the first coordinate-changing assembly update since 2009; it reflects the resolution of roughly 1000 issues and encompasses modifications ranging from thousands of single base changes to megabase-scale path reorganizations, gap closures, and localization of previously orphaned sequences. We developed a new approach to sequence generation for targeted base updates and used data from new genome mapping technologies and single haplotype resources to identify and resolve larger assembly issues. For the first time, the reference assembly contains sequence-based representations for the centromeres. We also expanded the number of alternate loci to create a reference that provides a more robust representation of human population variation. We demonstrate that the updates render the reference an improved annotation substrate, alter read alignments in unchanged regions, and impact variant interpretation at clinically relevant loci. We additionally evaluated a collection of new de novo long-read haploid assemblies and conclude that although the new assemblies compare favorably to the reference with respect to continuity, error rate, and gene completeness, the reference still provides the best representation for complex genomic regions and coding sequences. We assert that the collected updates in GRCh38 make the newer assembly a more robust substrate for comprehensive analyses that will promote our understanding of human biology and advance our efforts to improve health.

Funder

National Institutes of Health

National Library of Medicine

Wellcome Trust

European Molecular Biology Laboratory

National Human Genome Research Institute

Howard Hughes Medical Institute

Publisher

Cold Spring Harbor Laboratory

Subject

Genetics(clinical),Genetics

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3