Accurate assembly of multiple RNA-seq samples with Aletsch-Reference-Cited by-同舟云学术

Accurate assembly of multiple RNA-seq samples with Aletsch

Published:2024-06-28 Issue:Supplement_1 Volume:40 Page:i307-i317
ISSN:1367-4803
Container-title:Bioinformatics
language:en
Short-container-title:

Author:

Shi Qian¹,Zhang Qimin¹,Shao Mingfu¹²

Affiliation:

1. Department of Computer Science and Engineering, The Pennsylvania State University , University Park, PA 16802, United States

2. Huck Institutes of the Life Sciences, The Pennsylvania State University , University Park, PA 16802, United States

Abstract

Abstract Motivation High-throughput RNA sequencing has become indispensable for decoding gene activities, yet the challenge of reconstructing full-length transcripts persists. Traditional single-sample assemblers frequently produce fragmented transcripts, especially in single-cell RNA-seq data. While algorithms designed for assembling multiple samples exist, they encounter various limitations. Results We present Aletsch, a new assembler for multiple bulk or single-cell RNA-seq samples. Aletsch incorporates several algorithmic innovations, including a “bridging” system that can effectively integrate multiple samples to restore missed junctions in individual samples, and a new graph-decomposition algorithm that leverages “supporting” information across multiple samples to guide the decomposition of complex vertices. A standout feature of Aletsch is its application of a random forest model with 50 well-designed features for scoring transcripts. We demonstrate its robust adaptability across different chromosomes, datasets, and species. Our experiments, conducted on RNA-seq data from several protocols, firmly demonstrate Aletsch’s significant outperformance over existing meta-assemblers. As an example, when measured with the partial area under the precision-recall curve (pAUC, constrained by precision), Aletsch surpasses the leading assemblers TransMeta by 22.9%–62.1% and PsiCLASS by 23.0%–175.5% on human datasets. Availability and implementation Aletsch is freely available at https://github.com/Shao-Group/aletsch. Scripts that reproduce the experimental results of this manuscript is available at https://github.com/Shao-Group/aletsch-test.

Funder

National Science Foundation

National Institutes of Health

Publisher

Oxford University Press (OUP)

Link

https://academic.oup.com/bioinformatics/article-pdf/40/Supplement_1/i307/58354938/btae215.pdf

Reference21 articles.

1. MITIE: simultaneous RNA-Seq-based transcript identification and quantification in multiple samples;Behr;Bioinformatics,2013

2. Polyester: simulating RNA-seq datasets with differential transcript expression;Frazee;Bioinformatics,2015

3. Single-cell RNA counting at allele and isoform resolution using smart-seq3;Hagemann-Jensen;Nat Biotechnol,2020

4. Scalable single-cell RNA sequencing from full transcripts with smart-seq3xpress;Hagemann-Jensen;Nat Biotechnol,2022