UnigeneFinder: An automated pipeline for gene calling from transcriptome assemblies without a reference genome-Reference-Cited by-同舟云学术

UnigeneFinder: An automated pipeline for gene calling from transcriptome assemblies without a reference genome

Published:2024-08-19 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Xue Bo,Prado Karine,Rhee Seung Yon,Stata Matt

Abstract

ABSTRACTFor most species in nature, transcriptome data is much more readily available than genome data. Without a reference genome, however, gene calling is cumbersome and inaccurate due to the high degree of redundancy inde novotranscriptome assemblies. To simplify and increase the accuracy ofde novotranscriptome assembly in the absence of a reference genome, we developed UnigeneFinder. Combining several clustering methods, UnigeneFinder substantially reduces the redundancy typical of raw transcriptome assemblies. This pipeline offers an effective solution to the problem of inflated transcript numbers, achieving a closer representation of the actual underlying genome. UnigeneFinder performs comparably or better, compared to existing tools, on plant species with varying genome complexities. UnigeneFinder is the only available transcriptome redundancy solution that fully automates the generation of primary transcript, coding region, and protein sequences, analogous to those available for high quality reference genomes. These features, coupled with the pipeline’s cross-platform implementation and focus on automation and an accessible user interface, make UnigeneFinder a useful tool for many downstream sequence-based analyses in non-model organisms lacking a reference genome, including differential gene expression analysis, accurate ortholog identification, functional enrichments, and evolutionary analyses. UnigeneFinder also runs efficiently both on high-performance computing (HPC) systems and personal computers, further reducing barriers to use.

Publisher

Cold Spring Harbor Laboratory

Reference30 articles.

1. Inferring Orthology and Paralogy

2. Analysis of the genome sequence of the flowering plant Arabidopsis thaliana

3. A chromosome-anchored eggplant genome sequence reveals key events in Solanaceae evolution;Scientific Reports,2019

4. Paleopolyploidy in the Brassicales: Analyses of the Cleome Transcriptome Elucidate the History of Genome Duplications in Arabidopsis and Other Brassicales

5. Trimmomatic: a flexible trimmer for Illumina sequence data