A Novel Approach for Accurate Sequence Assembly Using de Bruijn graphs-Reference-Cited by-同舟云学术

A Novel Approach for Accurate Sequence Assembly Using de Bruijn graphs

Published:2024-06-02 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Prybol Cameron J.^ORCID,Hammack Aeron T.^ORCID,Ashley Euan A.^ORCID,Snyder Michael P.^ORCID

Abstract

AbstractSequence assembly methods are valuable for reconstructing genomes from shorter read fragments. Modern nucleic acid sequencing instruments produce quality scores associated with each reported base; however, these quality scores are not generally used as a core part of sequence assembly or alignment algorithms. Here, we leverage weighted de Bruijn graphs as graphical probability models representing the relative abundances and qualities of kmers within FASTQ-encoded observations. We then utilize these weighted de Bruijn graphs to identify alternate, higher-likelihood candidate sequences compared to the original observations, which are known to contain errors. By improving the original observations with these resampled paths, iteratively across increasing k-lengths, we can use this expectation-maximization approach to “polish” read sets from any sequencing technology according to the mutual information shared in the reads. We use this polishing approach to probabilistically correct simulated short- and long-read datasets of lower coverages and higher error rates than some algorithms can produce satisfactory assemblies for. We find that this approach corrects sequencing errors at rates that are able to produce error-free and nearly-error-free de Bruijn assembly graphs for simulated read-set challenges.

Publisher

Cold Spring Harbor Laboratory

Reference48 articles.

1. High-Throughput Sequencing Technologies

2. Complete, closed bacterial genomes from microbiomes using nanopore sequencing

3. The complete sequence of a human genome

4. Assembling the perfect bacterial genome using Oxford Nanopore and Illumina sequencing

5. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm