Localized assembly for long reads enables genome-wide analysis of repetitive regions at single-base resolution in human genomes-Reference-Cited by-同舟云学术

Localized assembly for long reads enables genome-wide analysis of repetitive regions at single-base resolution in human genomes

Published:2022-12-03 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Ikemoto Ko,Fujimoto Hinano,Fujimoto Akihiro

Abstract

AbstractBackgroundLong-read sequencing technologies have the potential to overcome the limitations of short reads and provide a comprehensive picture of the human genome. However, it remains hard to characterize repetitive sequences by reconstructing genomic structures at high resolution solely from long reads. Here, we developed a localized assembly method (LoMA) that constructs highly accurate consensus sequences (CSs) from long reads.MethodsWe first developed LoMA, by combining minimap2, MAFFT, and our algorithm, which classifies diploid haplotypes based on structural variants and constructs CSs. Using this tool, we analyzed two human samples (NA18943 and NA19240) sequenced with the Oxford Nanopore sequencer. We defined target regions in each genome based on mapping patterns and then constructed a high-quality catalog of the human insertion solely from the long-read data.ResultsThe assessment of LoMA showed high accuracy of CSs (error rate < 0.3%) compared with raw data (error rate > 8%) and superiority to the previous study. The genome-wide analysis of NA18943 and NA19240 identified 5,516 and 6,542 insertions (ζ 100 bp) respectively. Most insertions (∼80%) were derived from the tandem repeat and transposable elements. We also detected processed pseudogenes, insertions in transposable elements, and long insertions (> 10 kbp). Further, our analysis suggested that short tandem duplications were association with gene expression and transposons.ConclusionsOur analysis showed that LoMA constructs high-quality sequences from long reads with substantial errors. This study revealed the true structures of insertions with high accuracy and inferred mechanisms for the insertions. Our approach contributes to the future human genome studies. LoMA is available at our GitHub page:https://github.com/kolikem/loma.

Publisher

Cold Spring Harbor Laboratory

Reference52 articles.

1. Long-read human genome sequencing and its applications;Nat Rev Genet,2020

2. Long-read sequencing identifies GGC repeat expansions in NOTCH2NLC associated with neuronal intranuclear inclusion disease

3. Telomere-to-telomere assembly of a complete human X chromosome;Nature,2020

4. Sano Y , Koyanagi Y , Wong JH , Murakami Y , Fujiwara K , Endo M , et al. Likely pathogenic structural variants in genetically unsolved patients with retinitis pigmentosa revealed by long-read sequencing. J Med Genet. 2022;jmedgenet-2022-108428. https://doi.org/10.1136/jmedgenet-2022-108428.

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Long-read sequencing reveals the complex structure of extra dic(21;21) chromosome and its biological changes;2023-04-20