proovframe: frameshift-correction for long-read (meta)genomics-Reference-Cited by-同舟云学术

proovframe: frameshift-correction for long-read (meta)genomics

Published:2021-08-24 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Hackl Thomas^ORCID,Trigodet Florian^ORCID,Eren A. Murat^ORCID,Biller Steven J.^ORCID,Eppley John M.^ORCID,Luo Elaine^ORCID,Burger Andrew,DeLong Edward F.^ORCID,Fischer Matthias G.^ORCID

Abstract

AbstractLong-read sequencing technologies hold big promises for the genomic analysis of complex samples such as microbial communities. Yet, despite improving accuracy, basic gene prediction on long-read data is still often impaired by frameshifts resulting from small indels. Consensus polishing using either complementary short reads or to a lesser extent the long reads themselves can mitigate this effect but requires universally high sequencing depth, which is difficult to achieve in complex samples where the majority of community members are rare. Here we present proovframe, a software implementing an alternative approach to overcome frameshift errors in long-read assemblies and raw long reads. We utilize protein-to-nucleotide alignments against reference databases to pinpoint indels in contigs or reads and correct them by deleting or inserting 1-2 bases, thereby conservatively restoring reading-frame fidelity in aligned regions. Using simulated and real-world benchmark data we show that proovframe performs comparably to short-read-based polishing on assembled data, works well with remote protein homologs, and can even be applied to raw reads directly. Together, our results demonstrate that protein-guided frameshift correction significantly improves the analyzability of long-read data both in combination with and as an alternative to common polishing strategies. Proovframe is available from https://github.com/thackl/proovframe.

Publisher

Cold Spring Harbor Laboratory

Reference48 articles.

1. Long reads: their purpose and place

2. A comparative evaluation of hybrid error correction methods for error-prone long reads

3. Dohm, J. C. , Peters, P. , Stralis-Pavese, N. & Himmelbauer, H. Benchmarking of long-read correction methods. NAR Genomics and Bioinformatics 2, (2020).

4. Detecting alternatively spliced transcript isoforms from single‐molecule long‐read sequences without a reference genome

5. Fast and accurate long-read assembly with wtdbg2;Nat. Methods,2020

Cited by 21 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Capturing clinically relevant Campylobacter attributes through direct whole genome sequencing of stool;Microbial Genomics;2024-08-30

2. Distinct horizontal gene transfer potential of extracellular vesicles versus viral-like particles in marine habitats;2024-07-18

3. Biogeography and impact of nitrous oxide reducers in rivers across a broad environmental gradient on emission rates;Environmental Microbiology;2024-05

4. Snakemake workflows for long-read bacterial genome assembly and evaluation;Gigabyte;2024-04-01

5. Exophiala chapopotensis sp. nov., an extremotolerant black yeast from an oil-polluted soil in Mexico; phylophenetic approach to species hypothesis in the Herpotrichiellaceae family;PLOS ONE;2024-02-14