<scp>LSTrAP</scp>‐<i>denovo</i>: Automated Generation of Transcriptome Atlases for Eukaryotic Species Without Genomes-Reference-Cited by-同舟云学术

LSTrAP‐denovo: Automated Generation of Transcriptome Atlases for Eukaryotic Species Without Genomes

Published:2024-07 Issue:4 Volume:176 Page:
ISSN:0031-9317
Container-title:Physiologia Plantarum
language:en
Short-container-title:Physiologia Plantarum

Author:

Lim Peng Ken¹^ORCID,Wang Ruoxi¹,Mutwil Marek¹^ORCID

Affiliation:

1. School of Biological Sciences Nanyang Technological University Singapore Singapore

Abstract

AbstractDespite the abundance of species with transcriptomic data, a significant number of species still lack sequenced genomes, making it difficult to study gene function and expression in these organisms. While de novo transcriptome assembly can be used to assemble protein‐coding transcripts from RNA‐sequencing (RNA‐seq) data, the datasets used often only feature samples of arbitrarily selected or similar experimental conditions, which might fail to capture condition‐specific transcripts. We developed the Large‐Scale Transcriptome Assembly Pipeline for de novo assembled transcripts (LSTrAP‐denovo) to automatically generate transcriptome atlases of eukaryotic species. Specifically, given an NCBI TaxID, LSTrAP‐denovo can (1) filter undesirable RNA‐seq accessions based on read data, (2) select RNA‐seq accessions via unsupervised machine learning to construct a sample‐balanced dataset for download, (3) assemble transcripts via over‐assembly, (4) functionally annotate coding sequences (CDS) from assembled transcripts and (5) generate transcriptome atlases in the form of expression matrices for downstream transcriptomic analyses. LSTrAP‐denovo is easy to implement, written in Python, and is freely available at https://github.com/pengkenlim/LSTrAP-denovo/.

Funder

Ministry of Education - Singapore

Publisher

Wiley

Link

https://onlinelibrary.wiley.com/doi/pdf/10.1111/ppl.14407

Reference58 articles.

1. De novo leaf transcriptome assembly of Bougainvillea spectabilis for the identification of genes involves in the secondary metabolite pathways

2. Gene Ontology: tool for the unification of biology

3. The InterPro protein families and domains database: 20 years on

4. Trimmomatic: a flexible trimmer for Illumina sequence data

5. Near-optimal probabilistic RNA-seq quantification

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Constructing Ensemble Gene Functional Networks Capturing Tissue/condition-specific Co-expression from Unlabled Transcriptomic Data with TEA-GCN;2024-07-23