Shark: fishing relevant reads in an RNA-Seq sample-Reference-Cited by-同舟云学术

Shark: fishing relevant reads in an RNA-Seq sample

Published:2020-09-14 Issue:4 Volume:37 Page:464-472
ISSN:1367-4803
Container-title:Bioinformatics
language:en
Short-container-title:

Author:

Denti Luca¹^ORCID,Pirola Yuri¹^ORCID,Previtali Marco¹^ORCID,Ceccato Tamara¹,Della Vedova Gianluca¹^ORCID,Rizzi Raffaella¹,Bonizzoni Paola¹

Affiliation:

1. Department of Informatics, Systems and Communication, University of Milano-Bicocca, Milano 20126, Italy

Abstract

Abstract Motivation Recent advances in high-throughput RNA-Seq technologies allow to produce massive datasets. When a study focuses only on a handful of genes, most reads are not relevant and degrade the performance of the tools used to analyze the data. Removing irrelevant reads from the input dataset leads to improved efficiency without compromising the results of the study. Results We introduce a novel computational problem, called gene assignment and we propose an efficient alignment-free approach to solve it. Given an RNA-Seq sample and a panel of genes, a gene assignment consists in extracting from the sample, the reads that most probably were sequenced from those genes. The problem becomes more complicated when the sample exhibits evidence of novel alternative splicing events. We implemented our approach in a tool called Shark and assessed its effectiveness in speeding up differential splicing analysis pipelines. This evaluation shows that Shark is able to significantly improve the performance of RNA-Seq analysis tools without having any impact on the final results. Availability and implementation The tool is distributed as a stand-alone module and the software is freely available at https://github.com/AlgoLab/shark. Supplementary information Supplementary data are available at Bioinformatics online.

Funder

European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie

Publisher

Oxford University Press (OUP)

Subject

Computational Mathematics,Computational Theory and Mathematics,Computer Science Applications,Molecular Biology,Biochemistry,Statistics and Probability

Link

http://academic.oup.com/bioinformatics/advance-article-pdf/doi/10.1093/bioinformatics/btaa779/34774651/btaa779.pdf

Reference30 articles.

1. A space and time-efficient index for the compacted colored de Bruijn graph;Almodaresi;Bioinformatics,2018

2. Complementarity of assembly-first and mapping-first approaches for alternative splicing annotation and differential analysis from RNAseq data;Benoit-Pilven;Sci. Rep,2018

Cited by 8 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Differential quantification of alternative splicing events on spliced pangenome graphs;2023-11-07

2. Unveiling the Robustness of Machine Learning Models in Classifying COVID-19 Spike Sequences;2023-08-24

3. Benchmarking machine learning robustness in Covid-19 genome sequence classification;Scientific Reports;2023-03-13

4. PDB2Vec: Using 3D Structural Information for Improved Protein Analysis;Bioinformatics Research and Applications;2023

5. Unveiling the Robustness of Machine Learning Models in Classifying COVID-19 Spike Sequences;Bioinformatics Research and Applications;2023