Abstract
Abstract
Background
Long-read sequencing (LRS) techniques have been very successful in identifying structural variants (SVs). However, the high error rate of LRS made the detection of small variants (substitutions and short indels < 20 bp) more challenging. The introduction of PacBio HiFi sequencing makes LRS also suited for detecting small variation. Here we evaluate the ability of HiFi reads to detect de novo mutations (DNMs) of all types, which are technically challenging variant types and a major cause of sporadic, severe, early-onset disease.
Methods
We sequenced the genomes of eight parent–child trios using high coverage PacBio HiFi LRS (~ 30-fold coverage) and Illumina short-read sequencing (SRS) (~ 50-fold coverage). De novo substitutions, small indels, short tandem repeats (STRs) and SVs were called in both datasets and compared to each other to assess the accuracy of HiFi LRS. In addition, we determined the parent-of-origin of the small DNMs using phasing.
Results
We identified a total of 672 and 859 de novo substitutions/indels, 28 and 126 de novo STRs, and 24 and 1 de novo SVs in LRS and SRS respectively. For the small variants, there was a 92 and 85% concordance between the platforms. For the STRs and SVs, the concordance was 3.6 and 0.8%, and 4 and 100% respectively. We successfully validated 27/54 LRS-unique small variants, of which 11 (41%) were confirmed as true de novo events. For the SRS-unique small variants, we validated 42/133 DNMs and 8 (19%) were confirmed as true de novo event. Validation of 18 LRS-unique de novo STR calls confirmed none of the repeat expansions as true DNM. Confirmation of the 23 LRS-unique SVs was possible for 19 candidate SVs of which 10 (52.6%) were true de novo events. Furthermore, we were able to assign 96% of DNMs to their parental allele with LRS data, as opposed to just 20% with SRS data.
Conclusions
HiFi LRS can now produce the most comprehensive variant dataset obtainable by a single technology in a single laboratory, allowing accurate calling of substitutions, indels, STRs and SVs. The accuracy even allows sensitive calling of DNMs on all variant levels, and also allows for phasing, which helps to distinguish true positive from false positive DNMs.
Publisher
Springer Science and Business Media LLC
Subject
Genetics (clinical),Genetics,Molecular Biology,Molecular Medicine
Reference59 articles.
1. Lupski JR, Belmont JW, Boerwinkle E, Gibbs RA. Clan genomics and the complex architecture of human disease. Cell. 2011 Sep 30 [Cited 2022 Mar 24];147(1):32–43. Available from: https://pubmed.ncbi.nlm.nih.gov/21962505/
2. Acuna-Hidalgo R, Veltman JA, Hoischen A. New insights into the generation and role of de novo mutations in health and disease. Genome Biol. 2016 Nov 28 [Cited 2022 Mar 24];17(1). Available from: https://pubmed.ncbi.nlm.nih.gov/27894357/
3. Veltman JA, Brunner HG. De novo mutations in human genetic disease. Nat Rev Genet. 2012 Aug [Cited 2022 May 29];13(8):565–75. Available from: https://pubmed.ncbi.nlm.nih.gov/22805709/
4. Pauper M, Kucuk E, Wenger AM, Chakraborty S, Baybayan P, Kwint M, et al. Long-read trio sequencing of individuals with unsolved intellectual disability. Eur J Hum Genet 2020 294. 2020 Nov 30 [Cited 2022 Apr 11];29(4):637–48. Available from: https://www.nature.com/articles/s41431-020-00770-0
5. Noyes MD, Harvey WT, Porubsky D, Sulovari A, Li R, Rose NR, et al. Familial long-read sequencing increases yield of de novo mutations. Am J Hum Genet. 2022 Mar [Cited 2022 Mar 24];0(0). Available from: http://www.cell.com/article/S0002929722000659/fulltext
Cited by
16 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献