The complete and fully-phased diploid genome of a male Han Chinese
-
Published:2023-07-14
Issue:10
Volume:33
Page:745-761
-
ISSN:1748-7838
-
Container-title:Cell Research
-
language:en
-
Short-container-title:Cell Res
Author:
Yang ChentaoORCID, Zhou Yang, Song YanniORCID, Wu DongyaORCID, Zeng YanORCID, Nie Lei, Liu PanhongORCID, Zhang Shilong, Chen GuangjiORCID, Xu Jinjin, Zhou Hongling, Zhou LongORCID, Qian Xiaobo, Liu Chenlu, Tan Shangjin, Zhou Chengran, Dai WeiORCID, Xu MengyangORCID, Qi Yanwei, Wang Xiaobo, Guo LidongORCID, Fan Guangyi, Wang AijunORCID, Deng Yuan, Zhang Yong, Jin Jiazheng, He Yunqiu, Guo Chunxue, Guo GuojiORCID, Zhou Qing, Xu XunORCID, Yang HuanmingORCID, Wang Jian, Xu ShuhuaORCID, Mao Yafei, Jin XinORCID, Ruan JueORCID, Zhang GuojieORCID
Abstract
AbstractSince the release of the complete human genome, the priority of human genomic study has now been shifting towards closing gaps in ethnic diversity. Here, we present a fully phased and well-annotated diploid human genome from a Han Chinese male individual (CN1), in which the assemblies of both haploids achieve the telomere-to-telomere (T2T) level. Comparison of this diploid genome with the CHM13 haploid T2T genome revealed significant variations in the centromere. Outside the centromere, we discovered 11,413 structural variations, including numerous novel ones. We also detected thousands of CN1 alleles that have accumulated high substitution rates and a few that have been under positive selection in the East Asian population. Further, we found that CN1 outperforms CHM13 as a reference genome in mapping and variant calling for the East Asian population owing to the distinct structural variants of the two references. Comparison of SNP calling for a large cohort of 8869 Chinese genomes using CN1 and CHM13 as reference respectively showed that the reference bias profoundly impacts rare SNP calling, with nearly 2 million rare SNPs miss-called with different reference genomes. Finally, applying the CN1 as a reference, we discovered 5.80 Mb and 4.21 Mb putative introgression sequences from Neanderthal and Denisovan, respectively, including many East Asian specific ones undetected using CHM13 as the reference. Our analyses reveal the advances of using CN1 as a reference for population genomic studies and paleo-genomic studies. This complete genome will serve as an alternative reference for future genomic studies on the East Asian population.
Funder
International Institutes of Medicine at Yiwu and Kunpeng Fellowship National Key Research and Development Project Program of China
Publisher
Springer Science and Business Media LLC
Subject
Cell Biology,Molecular Biology
Reference130 articles.
1. Popejoy, A. B. & Fullerton, S. M. Genomics is failing on diversity. Nature 538, 161–164 (2016). 2. Sirugo, G., Williams, S. M. & Tishkoff, S. A. The missing diversity in human genetic studies. Cell 177, 26–31 (2019). 3. Martin, A. R. et al. Clinical use of current polygenic risk scores may exacerbate health disparities. Nat. Genet. 51, 584–591 (2019). 4. Duncan, L. et al. Analysis of polygenic risk score usage and performance in diverse human populations. Nat. Commun. 10, 3328 (2019). 5. Ballouz, S., Dobin, A. & Gillis, J. A. Is it time to change the reference genome? Genome Biol. 20, 159 (2019).
Cited by
20 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
|
|