LSTrAP-<i>denovo</i>: Automated Generation of Transcriptome Atlases for Eukaryotic Species Without Genomes-Reference-Cited by-同舟云学术

LSTrAP-denovo: Automated Generation of Transcriptome Atlases for Eukaryotic Species Without Genomes

Published:2023-03-06 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Lim Peng Ken^ORCID,Mutwil Marek^ORCID

Abstract

Structured AbstractMotivationDespite the abundance of species with transcriptomic data, a significant number of the species still lack genomes, making it difficult to study gene function and expression in these organisms. Whilede novotranscriptome assembly can be used to assemble protein-coding transcripts from RNA-sequencing (RNA-seq) data, the datasets used often only feature samples of arbitrarily-selected or similar experimental conditions which might fail to capture condition-specific transcripts.ResultsWe developed the Large-Scale Transcriptome Assembly Pipeline forde novoassembled transcripts (LSTrAP-denovo) to automatically generate transcriptome atlases of eukaryotic species. Specifically, given an NCBI TaxID, LSTrAP-denovocan (1) filter undesirable RNA-seq accessions based on read data, (2) select RNA-seq accessions via unsupervised machine learning to construct a sample-balanced dataset for download, (3) assemble transcripts via over-assembly, (4) functionally annotate coding sequences (CDS) from assembled transcripts and (5) generate transcriptome atlases in the form of expression matrices for downstream transcriptomic analyses.Availability and ImplementationLSTrAP-denovois easy to implement, written in python, and is freely available athttps://github.com/pengkenlim/LSTrAP-denovo/.Supplementary InformationSupplementary data are available in the forms of supplementary figures, supplementary tables, and supplementary methods.

Publisher

Cold Spring Harbor Laboratory

Reference59 articles.

1. De novo leaf transcriptome assembly of Bougainvillea spectabilis for the identification of genes involves in the secondary metabolite pathways;Gene,2020

2. Gene ontology: Tool for the unification of biology;The Gene Ontology Consortium. Nature Genetics,2000

3. The InterPro protein families and domains database: 20 years on

4. Trimmomatic: a flexible trimmer for Illumina sequence data

5. Near-optimal probabilistic RNA-seq quantification