MG-RAST version 4—lessons learned from a decade of low-budget ultra-high-throughput metagenome analysis-Reference-Cited by-同舟云学术

MG-RAST version 4—lessons learned from a decade of low-budget ultra-high-throughput metagenome analysis

Published:2017-09-26 Issue:4 Volume:20 Page:1151-1159
ISSN:1467-5463
Container-title:Briefings in Bioinformatics
language:en
Short-container-title:

Author:

Meyer Folker¹²³⁴⁵,Bagchi Saurabh⁶,Chaterji Somali⁷^ORCID,Gerlach Wolfgang⁸,Grama Ananth⁹,Harrison Travis⁸,Paczian Tobias¹⁰,Trimble William L¹¹,Wilke Andreas¹²

Affiliation:

1. Computational Biologist, Argonne National Laboratory, USA

2. Department of Medicine, University of Chicago, Chicago, USA

3. Computation Institute, University of Chicago, Chicago, USA

4. Biology Division, Argonne National Laboratory, USA

5. Institute of Genomics and Systems Biology, Argonne National Laboratory and University of Chicago, USA

6. School of Electrical and Computer Engineering and Department of Computer Science, Purdue University, USA

7. Research Faculty, Purdue University, USA

8. Bioinformatics Senior Software Engineer, University of Chicago, Argonne National Laboratory, USA

9. Computer Science, Purdue University, USA

10. Senior Developer, University of Chicago, Argonne National Laboratory, USA

11. Postdoctoral researcher, Argonne National Laboratory, USA

12. Principal Bioinformatics Specialist, Argonne National Laboratory, University of Chicago, USA

Abstract

Abstract As technologies change, MG-RAST is adapting. Newly available software is being included to improve accuracy and performance. As a computational service constantly running large volume scientific workflows, MG-RAST is the right location to perform benchmarking and implement algorithmic or platform improvements, in many cases involving trade-offs between specificity, sensitivity and run-time cost. The work in [Glass EM, Dribinsky Y, Yilmaz P, et al. ISME J 2014;8:1–3] is an example; we use existing well-studied data sets as gold standards representing different environments and different technologies to evaluate any changes to the pipeline. Currently, we use well-understood data sets in MG-RAST as platform for benchmarking. The use of artificial data sets for pipeline performance optimization has not added value, as these data sets are not presenting the same challenges as real-world data sets. In addition, the MG-RAST team welcomes suggestions for improvements of the workflow. We are currently working on versions 4.02 and 4.1, both of which contain significant input from the community and our partners that will enable double barcoding, stronger inferences supported by longer-read technologies, and will increase throughput while maintaining sensitivity by using Diamond and SortMeRNA. On the technical platform side, the MG-RAST team intends to support the Common Workflow Language as a standard to specify bioinformatics workflows, both to facilitate development and efficient high-performance implementation of the community’s data analysis tasks.

Funder

National Science Foundation

Gordon and Betty Moore Foundation

National Institutes of Health

U.S. Department of Energy

Publisher

Oxford University Press (OUP)

Subject

Molecular Biology,Information Systems

Link

http://academic.oup.com/bib/article-pdf/20/4/1151/31550497/bbx105.pdf

Reference66 articles.