Sensitive alignment using paralogous sequence variants improves long read mapping and variant calling in segmental duplications-Reference-Cited by-同舟云学术

Sensitive alignment using paralogous sequence variants improves long read mapping and variant calling in segmental duplications

Published:2020-07-16 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Prodanov Timofey,Bansal Vikas

Abstract

AbstractThe ability to characterize repetitive regions of the human genome is limited by the read lengths of short-read sequencing technologies. Although long-read sequencing technologies such as Pacific Biosciences and Oxford Nanopore can potentially overcome this limitation, long segmental duplications with high sequence identity pose challenges for long-read mapping. We describe a probabilistic method, DuploMap, designed to improve the accuracy of long read mapping in segmental duplications. It analyzes reads mapped to segmental duplications using existing long-read aligners and leverages paralogous sequence variants (PSVs) – sequence differences between paralogous sequences – to distinguish between multiple alignment locations. On simulated datasets, Duplomap increased the percentage of correctly mapped reads with high confidence for multiple long-read aligners including Minimap2 (74.3% to 90.6%) and BLASR (82.9% to 90.7%) while maintaining high precision. Across multiple whole-genome long-read datasets, DuploMap aligned an additional 8-21% of the reads in segmental duplications with high confidence relative to Minimap2. Using Duplomap aligned PacBio CCS reads, an additional 8.9 Mbp of DNA sequence was mappable, variant calling achieved a higher F1-score and 14,713 additional variants supported by linked-read data were identified. Finally, we demonstrate that a significant fraction of PSVs in segmental duplications overlap with variants and adversely impact short-read variant calling.

Publisher

Cold Spring Harbor Laboratory

Reference49 articles.

1. Repetitive DNA and next-generation sequencing: computational challenges and solutions

2. Segmental Duplications: Organization and Impact Within the Current Human Genome Project Assembly

3. Recent Segmental Duplications in the Human Genome

4. Navigating highly homologous genes in a molecular diagnostic setting: a resource for clinical next-generation sequencing

5. A frame-shift mutation of PMS2 is a widespread cause of Lynch syndrome

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. A long read mapping method for highly repetitive reference sequences;2020-11-02