Author:
Ahmadi Hosein,Sheikh-Assadi Morteza,Fatahi Reza,Zamani Zabihollah,Shokrpour Majid
Abstract
AbstractNon-erroneous and well-optimized transcriptome assembly is a crucial prerequisite for authentic downstream analyses. Each de novo assembler has its own algorithm-dependent pros and cons to handle the assembly issues and should be specifically tested for each dataset. Here, we examined efficiency of seven state-of-art assemblers on ~ 30 Gb data obtained from mRNA-sequencing of Thymus daenensis. In an ensemble workflow, combining the outputs of different assemblers associated with an additional redundancy-reducing step could generate an optimized outcome in terms of completeness, annotatability, and ORF richness. Based on the normalized scores of 16 benchmarking metrics, EvidentialGene, BinPacker, Trinity, rnaSPAdes, CAP3, IDBA-trans, and Velvet-Oases performed better, respectively. EvidentialGene, as the best assembler, totally produced 316,786 transcripts, of which 235,730 (74%) were predicted to have a unique protein hit (on uniref100), and also half of its transcripts contained an ORF. The total number of unique BLAST hits for EvidentialGene was approximately three times greater than that of the worst assembler (Velvet-Oases). EvidentialGene could even capture 17% and 7% more average BLAST hits than BinPacker and Trinity. Although BinPacker and CAP3 produced longer transcripts, the EvidentialGene showed a higher collinearity between transcript size and ORF length. Compared with the other programs, EvidentialGene yielded a higher number of optimal transcript sets, further full-length transcripts, and lower possible misassemblies. Our finding corroborates that in non-model species, relying on a single assembler may not give an entirely satisfactory result. Therefore, this study proposes an ensemble approach of accompanying EvidentialGene pipelines to acquire a superior assembly for T. daenensis.
Publisher
Springer Science and Business Media LLC
Reference53 articles.
1. Bistgani, Z. E. & Sefidkon, F. Review on ethnobotany, phytochemical, molecular and pharmacological activity of Thymus daenensis Celak. Biocatal. Agric. Biotechnol. 22, 101400 (2019).
2. Zarshenas, M. M. & Krenn, L. A critical overview on Thymus daenensis Celak: Phytochemical and pharmacological investigations. J. Integr. Med. 13(2), 91–98 (2015).
3. Tohidi, B., Rahimmalek, M. & Trindade, H. Review on essential oil, extracts composition, molecular and phytochemical properties of Thymus species in Iran. Indust. Crop. Prod. 134, 89–99 (2019).
4. Mohammadi, S. et al. Morphological and phytochemical screening of some Thymus ecotypes (Thymus spp.) native to Iran in order to select elite genotypes. J. Appl. Bot. Food. Qual. 93, 186–196 (2020).
5. Metzker, M. L. Sequencing technologies—The next generation. Nat. Rev. Genet. 11(1), 31–46 (2010).