The landscape of genomic structural variation in Indigenous Australians
Author:
Reis Andre L. M.ORCID, Rapadas MelissaORCID, Hammond Jillian M.ORCID, Gamaarachchi HasinduORCID, Stevanovski IgorORCID, Ayuputeri Kumaheri Meutia, Chintalaphani Sanjog R., Dissanayake Duminda S. B., Siggs Owen M.ORCID, Hewitt Alex W., Llamas BastienORCID, Brown Alex, Baynam GarethORCID, Mann Graham J.ORCID, McMorran Brendan J., Easteal SimonORCID, Hermes Azure, Jenkins Misty R.ORCID, Pearson Glen, Roe Yvette, Mohamed Janine, Murray Ben, Ormond-Parker Lyndon, Kneipp Erica, Nugent Keith, Mann Graham, Patel Hardip R., Deveson Ira W.ORCID,
Abstract
AbstractIndigenous Australians harbour rich and unique genomic diversity. However, Aboriginal and Torres Strait Islander ancestries are historically under-represented in genomics research and almost completely missing from reference datasets1–3. Addressing this representation gap is critical, both to advance our understanding of global human genomic diversity and as a prerequisite for ensuring equitable outcomes in genomic medicine. Here we apply population-scale whole-genome long-read sequencing4 to profile genomic structural variation across four remote Indigenous communities. We uncover an abundance of large insertion–deletion variants (20–49 bp; n = 136,797), structural variants (50 b–50 kb; n = 159,912) and regions of variable copy number (>50 kb; n = 156). The majority of variants are composed of tandem repeat or interspersed mobile element sequences (up to 90%) and have not been previously annotated (up to 62%). A large fraction of structural variants appear to be exclusive to Indigenous Australians (12% lower-bound estimate) and most of these are found in only a single community, underscoring the need for broad and deep sampling to achieve a comprehensive catalogue of genomic structural variation across the Australian continent. Finally, we explore short tandem repeats throughout the genome to characterize allelic diversity at 50 known disease loci5, uncover hundreds of novel repeat expansion sites within protein-coding genes, and identify unique patterns of diversity and constraint among short tandem repeat sequences. Our study sheds new light on the dimensions and dynamics of genomic structural variation within and beyond Australia.
Publisher
Springer Science and Business Media LLC
Subject
Multidisciplinary
Reference53 articles.
1. 1000 Genomes Project Consortium. A global reference for human genetic variation. Nature 526, 68–74 (2015). 2. Karczewski, K. J. et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature 581, 434–443 (2020). 3. Liao, W.-W. et al. A draft human pangenome reference. Nature 617, 312–324 (2023). 4. De Coster, W., Weissensteiner, M. H. & Sedlazeck, F. J. Towards population-scale long-read sequencing. Nat. Rev. Genet. 22, 572–587 (2021). 5. Chintalaphani, S. R., Pineda, S. S., Deveson, I. W. & Kumar, K. R. An update on the neurological short tandem repeat expansion disorders and the emergence of long-read sequencing diagnostics. Acta Neuropathol. Commun. 9, 98 (2021).
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
|
|