De novogenome assemblies from two Indigenous Americans from Arizona identify new polymorphisms in non-reference sequences

Author:

Köroğlu ÇiğdemORCID,Chen PengORCID,Traurig Michael,Altok Serdar,Bogardus Clifton,Baier Leslie J

Abstract

ABSTRACTThere is a collective push to diversify human genetic studies by including underrepresented populations. However, analyzing DNA sequence reads involves the initial step of aligning the reads to the GRCh38/hg38 reference genome which is inadequate for non-European ancestries. To help address this issue, we created a modified hg38 reference map usingde novosequence assemblies from Indigenous Americans living in Arizona (IAZ). Using HiFi SMRT long-read sequencing technology, we generatedde novogenome assemblies for one female and one male IAZ individual. Each assembly included ∼17 Mb of DNA sequence not present (non-reference sequence; NRS) in hg38, which consists mostly of repeat elements. Forty NRSs totaling 240 kb were uniquely anchored to the hg38 primary assembly generating a modified hg38-NRS reference genome. DNA sequence alignment and variant calling were then conducted with WGS sequencing data from 387 IAZ cohorts using both the hg38 and modified hg38-NRS reference maps. Variant calling with the hg38-NRS map identified ∼50,000 single nucleotide variants present in at least 5% of the WGS samples which were not detected with the hg38 reference map. We also directly assessed the NRSs positioned within genes. Seventeen NRSs anchored to regions including an identical 187 bp NRS found in both de novo assemblies. The NRS is located inHCN279 bp downstream of exon 3 and contains several putative transcriptional regulatory elements. Genotyping of theHCN2-NRS revealed that the insertion is enriched in IAZ (MAF = 0.45) compared to Caucasians (MAF = 0.15) and African Americans (MAF = 0.03). This study shows that inclusion of population-specific NRSs can dramatically change the variant profile in an under-represented ethnic groups and thereby lead to the discovery of previously missed common variations.AUTHOR SUMMARYGRCh38/hg38 reference genome has been the standard reference for large-scale human genetics studies. However, it does not adequately represent sequences of non-European ancestry. In this study, using long-read sequencing technology, we constructedde novosequence assemblies from two Indigenous Americans from Arizona. We then compared thede novoassemblies to the hg38 reference genome to identify non-reference sequences (NRSs). We integrated these NRSs into our whole-genome sequencing (WGS) variant calling pipeline to improve read alignment and variant detection. We also directly assessed the NRSs positioned within genes. Inclusion of population-specific NRSs dramatically changed the variant profile of our study group with under-represented ethnicity, revealing common variation not detected by our previous population-level WGS and genotyping studies.

Publisher

Cold Spring Harbor Laboratory

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3