Probably Correct: Rescuing Repeats with Short and Long Reads-Reference-Cited by-同舟云学术

Probably Correct: Rescuing Repeats with Short and Long Reads

Published:2020-12-31 Issue:1 Volume:12 Page:48
ISSN:2073-4425
Container-title:Genes
language:en
Short-container-title:Genes

Author:

Cechova Monika^ORCID

Abstract

Ever since the introduction of high-throughput sequencing following the human genome project, assembling short reads into a reference of sufficient quality posed a significant problem as a large portion of the human genome—estimated 50–69%—is repetitive. As a result, a sizable proportion of sequencing reads is multi-mapping, i.e., without a unique placement in the genome. The two key parameters for whether or not a read is multi-mapping are the read length and genome complexity. Long reads are now able to span difficult, heterochromatic regions, including full centromeres, and characterize chromosomes from “telomere to telomere”. Moreover, identical reads or repeat arrays can be differentiated based on their epigenetic marks, such as methylation patterns, aiding in the assembly process. This is despite the fact that long reads still contain a modest percentage of sequencing errors, disorienting the aligners and assemblers both in accuracy and speed. Here, I review the proposed and implemented solutions to the repeat resolution and the multi-mapping read problem, as well as the downstream consequences of reference choice, repeat masking, and proper representation of sex chromosomes. I also consider the forthcoming challenges and solutions with regards to long reads, where we expect the shift from the problem of repeat localization within a single individual to the problem of repeat positioning within pangenomes.

Publisher

MDPI AG

Subject

Genetics (clinical),Genetics

Link

https://www.mdpi.com/2073-4425/12/1/48/pdf

Reference109 articles.

1. An Overview of Duplicated Gene Detection Methods: Why the Duplication Mechanism Has to Be Accounted for in Their Choice

2. Identifying repeats and transposable elements in sequenced genomes: how to find your way through the dense forest of programs

3. Human transposable elements in Repbase: genomic footprints from fish to humans

4. Centromere studies in the era of ‘telomere-to-telomere’ genomics

5. Resolving the complexity of the human genome using single-molecule sequencing

Cited by 8 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Maptcha: an efficient parallel workflow for hybrid genome scaffolding;BMC Bioinformatics;2024-08-08

2. Improving the diagnosis of tuberculosis: old and new laboratory tools;Expert Review of Molecular Diagnostics;2024-06-02

3. OligoY pipeline for full Y chromosome painting;2024-03-11

4. An Efficient Parallel Sketch-based Algorithmic Workflow for Mapping Long Reads;2023-11-29

5. Direct detection of alpha satellite DNA with single-base resolution by using abasic Peptide Nucleic Acids and Fluorescent in situ Hybridization;Biosensors and Bioelectronics;2023-01