Accurate sequencing of DNA motifs able to form alternative (non-B) structures-Reference-Cited by-同舟云学术

Accurate sequencing of DNA motifs able to form alternative (non-B) structures

Published:2023-06 Issue:6 Volume:33 Page:907-922
ISSN:1088-9051
Container-title:Genome Research
language:en
Short-container-title:Genome Res.

Author:

Weissensteiner Matthias H.,Cremona Marzia A.^ORCID,Guiblet Wilfried M.,Stoler Nicholas,Harris Robert S.,Cechova Monika,Eckert Kristin A.,Chiaromonte Francesca,Huang Yi-Fei,Makova Kateryna D.

Abstract

Approximately 13% of the human genome at certain motifs have the potential to form noncanonical (non-B) DNA structures (e.g., G-quadruplexes, cruciforms, and Z-DNA), which regulate many cellular processes but also affect the activity of polymerases and helicases. Because sequencing technologies use these enzymes, they might possess increased errors at non-B structures. To evaluate this, we analyzed error rates, read depth, and base quality of Illumina, Pacific Biosciences (PacBio) HiFi, and Oxford Nanopore Technologies (ONT) sequencing at non-B motifs. All technologies showed altered sequencing success for most non-B motif types, although this could be owing to several factors, including structure formation, biased GC content, and the presence of homopolymers. Single-nucleotide mismatch errors had low biases in HiFi and ONT for all non-B motif types but were increased for G-quadruplexes and Z-DNA in all three technologies. Deletion errors were increased for all non-B types but Z-DNA in Illumina and HiFi, as well as only for G-quadruplexes in ONT. Insertion errors for non-B motifs were highly, moderately, and slightly elevated in Illumina, HiFi, and ONT, respectively. Additionally, we developed a probabilistic approach to determine the number of false positives at non-B motifs depending on sample size and variant frequency, and applied it to publicly available data sets (1000 Genomes, Simons Genome Diversity Project, and gnomAD). We conclude that elevated sequencing errors at non-B DNA motifs should be considered in low-read-depth studies (single-cell, ancient DNA, and pooled-sample population sequencing) and in scoring rare variants. Combining technologies should maximize sequencing accuracy in future studies of non-B DNA.

Funder

National Institutes of Health

Publisher

Cold Spring Harbor Laboratory

Subject

Genetics (clinical),Genetics

Reference61 articles.

1. A global reference for human genetic variation

2. Analyzing and minimizing PCR amplification bias in Illumina sequencing libraries

3. The Statistical Analysis of Compositional Data

4. Structural origins of adenine-tract bending

5. Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing

Cited by 9 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Non-canonical DNA in human and other ape telomere-to-telomere genomes;2024-09-03

2. Complete sequencing of ape genomes;2024-07-31

3. The complete sequence and comparative analysis of ape sex chromosomes;Nature;2024-05-29

4. Special Issue “Bioinformatics of Unusual DNA and RNA Structures”;International Journal of Molecular Sciences;2024-05-10

5. Non-B-form DNA is associated with centromere stability in newly-formed polyploid wheat;Science China Life Sciences;2024-04-16