Accelerating Sequence Alignment to Graphs-Reference-Cited by-同舟云学术

Accelerating Sequence Alignment to Graphs

Published:2019-05-27 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Jain Chirag,Dilthey Alexander,Misra Sanchit,Zhang Haowen^ORCID,Aluru Srinivas

Abstract

AbstractAligning DNA sequences to an annotated reference is a key step for genotyping in biology. Recent scientific studies have demonstrated improved inference by aligning reads to a variation graph, i.e., a reference sequence augmented with known genetic variations. Given a variation graph in the form of a directed acyclic string graph, the sequence to graph alignment problem seeks to find the best matching path in the graph for an input query sequence. Solving this problem exactly using a sequential dynamic programming algorithm takes quadratic time in terms of the graph size and query length, making it difficult to scale to high throughput DNA sequencing data. In this work, we propose the first parallel algorithm for computing sequence to graph alignments that leverages multiple cores and single-instruction multiple-data (SIMD) operations. We take advantage of the available inter-task parallelism, and provide a novel blocked approach to compute the score matrix while ensuring high memory locality. Using a 48-core Intel Xeon Skylake processor, the proposed algorithm achieves peak performance of 317 billion cell updates per second (GCUPS), and demonstrates near linear weak and strong scaling on up to 48 cores. It delivers significant performance gains compared to existing algorithms, and results in run-time reduction from multiple days to three hours for the problem of optimally aligning high coverage long (PacBio/ONT) or short (Illumina) DNA reads to an MHC human variation graph containing 10 million vertices.AvailabilityThe implementation of our algorithm is available at https://github.com/ParBLiSS/PaSGAL. Data sets used for evaluation are accessible using https://alurulab.cc.gatech.edu/PaSGAL.

Publisher

Cold Spring Harbor Laboratory

Reference44 articles.

1. Improved genome inference in the MHC using a population reference graph

2. High-accuracy HLA type inference from whole-genome sequencing data using population reference graphs;PLoS computational biology,2016

3. Graphtyper enables population-scale genotyping using pangenome graphs

4. J. A. Sibbesen , L. Maretty , and A. Krogh , “Accurate genotyping across variant classes and lengths using variant graphs,” Nature Publishing Group, Tech. Rep., 2018.

Cited by 13 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. The Human Pangenome Project: a global resource to map genomic diversity;Nature;2022-04-20

2. Computational graph pangenomics: a tutorial on data structures and their applications;Natural Computing;2022-03

3. Population-scale genotyping of structural variation in the era of long-read sequencing;Computational and Structural Biotechnology Journal;2022

4. Fast and Optimal Sequence-to-Graph Alignment Guided by Seeds;Lecture Notes in Computer Science;2022

5. Fast and Optimal Sequence-to-Graph Alignment Guided by Seeds;2021-11-08