Scavenger: A pipeline for recovery of unaligned reads utilising similarity with aligned reads-Reference-Cited by-同舟云学术

Scavenger: A pipeline for recovery of unaligned reads utilising similarity with aligned reads

Published:2022-10-13 Issue: Volume:8 Page:1587
ISSN:2046-1402
Container-title:F1000Research
language:en
Short-container-title:F1000Res

Author:

Yang Andrian^ORCID,Tang Joshua Y. S.,Troup Michael,Ho Joshua W. K.

Abstract

Read alignment is an important step in RNA-seq analysis as the result of alignment forms the basis for downstream analyses. However, recent studies have shown that published alignment tools have variable mapping sensitivity and do not necessarily align all the reads which should have been aligned, a problem we termed as the false-negative non-alignment problem. Here we present Scavenger, a python-based bioinformatics pipeline for recovering unaligned reads using a novel mechanism in which a putative alignment location is discovered based on sequence similarity between aligned and unaligned reads. We showed that Scavenger could recover unaligned reads in a range of simulated and real RNA-seq datasets, including single-cell RNA-seq data. We found that recovered reads tend to contain more genetic variants with respect to the reference genome compared to previously aligned reads, indicating that divergence between personal and reference genomes plays a role in the false-negative non-alignment problem. Even when the number of recovered reads is relatively small compared to the total number of reads, the addition of these recovered reads can impact downstream analyses, especially in terms of estimating the expression and differential expression of lowly expressed genes, such as pseudogenes.

Funder

Amazon Web Services

Department of Education, Australian Governement

National Health and Medical Research Council

National Heart Foundation of Australia

Publisher

F1000 Research Ltd

Subject

General Pharmacology, Toxicology and Pharmaceutics,General Immunology and Microbiology,General Biochemistry, Genetics and Molecular Biology,General Medicine

Link

https://f1000research.com/articles/8-1587/v2/pdf

Reference27 articles.

1. HISAT: a fast spliced aligner with low memory requirements.;D Kim;Nat Methods.,2015

2. STAR: ultrafast universal RNA-seq aligner.;A Dobin;Bioinformatics.,2013

3. The Subread aligner: fast, accurate and scalable read mapping by seed-and-vote.;Y Liao;Nucleic Acids Res.,2013

4. CRAC: an integrated approach to the analysis of RNA-seq reads.;N Philippe;Genome Biol.,2013

5. MapSplice: accurate mapping of RNA-seq reads for splice junction discovery.;K Wang;Nucleic Acids Res.,2010

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Design, execution, and interpretation of plant RNA-seq analyses;Frontiers in Plant Science;2023-06-30

2. Mobile genomics: tools and techniques for tackling transposons;Philosophical Transactions of the Royal Society B: Biological Sciences;2020-02-10