Hardware Performance Evaluation of De novo Transcriptome Assembly Software in Amazon Elastic Compute Cloud

Author:

Mora-Márquez Fernando1ORCID,Vázquez-Poletti José Luis2ORCID,Chano Víctor1,Collada Carmen1ORCID,Soto Álvaro1ORCID,de Heredia Unai López1ORCID

Affiliation:

1. GI Sistemas Naturales e Historia Forestal, Dpto. Sistemas y Recursos Naturales, ETSI Montes, Forestal y del Medio Natural, Universidad Politecnica de Madrid, Ciudad Universitaria, 28040 Madrid, Spain

2. GI Arquitectura de Sistemas Distribuidos, Dpto. Arquitectura de Computadores y Automatica, Facultad de Informatica, Universidad Complutense de Madrid, Ciudad Universitaria, 28040 Madrid, Spain

Abstract

Background: Bioinformatics software for RNA-seq analysis has a high computational requirement in terms of the number of CPUs, RAM size, and processor characteristics. Specifically, de novo transcriptome assembly demands large computational infrastructure due to the massive data size, and complexity of the algorithms employed. Comparative studies on the quality of the transcriptome yielded by de novo assemblers have been previously published, lacking, however, a hardware efficiency-oriented approach to help select the assembly hardware platform in a cost-efficient way. Objective: We tested the performance of two popular de novo transcriptome assemblers, Trinity and SOAPdenovo-Trans (SDNT), in terms of cost-efficiency and quality to assess limitations, and provided troubleshooting and guidelines to run transcriptome assemblies efficiently. Methods: We built virtual machines with different hardware characteristics (CPU number, RAM size) in the Amazon Elastic Compute Cloud of the Amazon Web Services. Using simulated and real data sets, we measured the elapsed time, cost, CPU percentage and output size of small and large data set assemblies. Results: For small data sets, SDNT outperformed Trinity by an order the magnitude, significantly reducing the time duration and costs of the assembly. For large data sets, Trinity performed better than SDNT. Both the assemblers provide good quality transcriptomes. Conclusion: The selection of the optimal transcriptome assembler and provision of computational resources depend on the combined effect of size and complexity of RNA-seq experiments.

Funder

Spanish Ministry of Economy and Competitiveness-MINECO

Spanish National Parks Agency, Ministry of Agriculture

Publisher

Bentham Science Publishers Ltd.

Subject

Computational Mathematics,Genetics,Molecular Biology,Biochemistry

Cited by 3 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3