Adaptive, sample-specific parameter selection for more accurate transcript assembly-Reference-Cited by-同舟云学术

Adaptive, sample-specific parameter selection for more accurate transcript assembly

Published:2024-01-30 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Shen Yihang^ORCID,Yan Zhiwen,Kingsford Carl^ORCID

Abstract

AbstractMotivationTranscript assemblers are tools to reconstruct expressed transcripts from RNA-seq data. These tools have a large number of tunable parameters, and accurate transcript assembly requires setting them suitably. Because of the heterogeneity of different RNA-seq samples, a single default setting or a small fixed set of parameter candidates can only support the good performance of transcript assembly on average, but are often suboptimal for many individual samples. Manually tuning parameters for each sample is extremely time consuming and requires specialized experience. Therefore, developing an automated system that can advise good parameter settings for individual samples becomes an important problem.ResultsUsing Bayesian optimization and contrastive learning, we develop a new automated parameter advising system for transcript assembly that can generate sets of sample-specific parameter candidates. Our framework achieves efficient sample-specific parameter advising by learning parameter knowledge from a large representative set of existing RNA-seq samples and transferring the knowledge to unseen samples. We use Scallop and StringTie, two well-known transcript assemblers, to test our framework on two collections of RNA-seq samples. Results show that our new parameter advising system significantly outperforms the previous advising method in each dataset and each transcript assembler. The source code to reproduce the results from this study can be found athttps://github.com/Kingsford-Group/autoparadvisor.

Publisher

Cold Spring Harbor Laboratory

Reference54 articles.

1. Accurate assembly of transcripts through phase-preserving graph decomposition

2. Qimin Zhang , Qian Shi , and Mingfu Shao . Scallop2 enables accurate assembly of multiple-end RNA-seq data. bioRxiv, 2021.

3. Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation

4. Ab initio reconstruction of cell type–specific transcriptomes in mouse reveals the conserved multi-exonic structure of lincRNAs

5. Characterizing the state of the art in the computational assignment of gene function: lessons from the first critical assessment of functional annotation (CAFA)