IntAPT: integrated assembly of phenotype-specific transcripts from multiple RNA-seq profiles-Reference-Cited by-同舟云学术

IntAPT: integrated assembly of phenotype-specific transcripts from multiple RNA-seq profiles

Published:2020-10-05 Issue:5 Volume:37 Page:650-658
ISSN:1367-4803
Container-title:Bioinformatics
language:en
Short-container-title:

Author:

Shi Xu¹²^ORCID,Neuwald Andrew F³,Wang Xiao¹,Wang Tian-Li⁴,Hilakivi-Clarke Leena⁵,Clarke Robert⁵,Xuan Jianhua¹^ORCID

Affiliation:

1. Bradley Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA 22203, USA

2. Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA

3. Institute for Genome Sciences and Department of Biochemistry & Molecular Biology, University of Maryland School of Medicine, Baltimore, MD 21201, USA

4. Department of Pathology, Johns Hopkins Medical Institutions, Baltimore, MD 21231, USA

5. Hormel Institute, University of Minnesota, 801 16th Ave NE, Austin, MN 55912, USA

Abstract

Abstract Motivation High-throughput RNA sequencing has revolutionized the scope and depth of transcriptome analysis. Accurate reconstruction of a phenotype-specific transcriptome is challenging due to the noise and variability of RNA-seq data. This requires computational identification of transcripts from multiple samples of the same phenotype, given the underlying consensus transcript structure. Results We present a Bayesian method, integrated assembly of phenotype-specific transcripts (IntAPT), that identifies phenotype-specific isoforms from multiple RNA-seq profiles. IntAPT features a novel two-layer Bayesian model to capture the presence of isoforms at the group layer and to quantify the abundance of isoforms at the sample layer. A spike-and-slab prior is used to model the isoform expression and to enforce the sparsity of expressed isoforms. Dependencies between the existence of isoforms and their expression are modeled explicitly to facilitate parameter estimation. Model parameters are estimated iteratively using Gibbs sampling to infer the joint posterior distribution, from which the presence and abundance of isoforms can reliably be determined. Studies using both simulations and real datasets show that IntAPT consistently outperforms existing methods for the IntAPT. Experimental results demonstrate that, despite sequencing errors, IntAPT exhibits a robust performance among multiple samples, resulting in notably improved identification of expressed isoforms of low abundance. Availability and implementation The IntAPT package is available at http://github.com/henryxushi/IntAPT. Supplementary information Supplementary data are available at Bioinformatics online.

Funder

National Institutes of Health

Publisher

Oxford University Press (OUP)

Subject

Computational Mathematics,Computational Theory and Mathematics,Computer Science Applications,Molecular Biology,Biochemistry,Statistics and Probability

Link

http://academic.oup.com/bioinformatics/advance-article-pdf/doi/10.1093/bioinformatics/btaa852/34542360/btaa852.pdf

Reference38 articles.

1. Bayesian nonparametric discovery of isoforms and individual specific quantification;Aguiar;Nat. Commun,2018

2. A convex formulation for joint RNA isoform detection and quantification from multiple RNA-seq samples;Bernard;BMC Bioinformatics,2015

3. Comprehensive molecular portraits of human breast tumours;Nature,2012

4. A survey of best practices for RNA-seq data analysis;Conesa;Genome Biol,2016

5. Sampling truncated normal, beta, and gamma densities;Damien;J. Comput. Graph. Stat,2001

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Long noncoding RNA study: Genome-wide approaches;Genes & Diseases;2022-11