BioWorkbench: a high-performance framework for managing and analyzing bioinformatics experiments-Reference-Cited by-同舟云学术

BioWorkbench: a high-performance framework for managing and analyzing bioinformatics experiments

Published:2018-08-29 Issue: Volume:6 Page:e5551
ISSN:2167-8359
Container-title:PeerJ
language:en
Short-container-title:

Author:

Mondelli Maria Luiza¹,Magalhães Thiago¹,Loss Guilherme¹,Wilde Michael²,Foster Ian²^ORCID,Mattoso Marta³,Katz Daniel⁴^ORCID,Barbosa Helio¹⁵,de Vasconcelos Ana Tereza R.¹^ORCID,Ocaña Kary¹,Gadelha Luiz M.R.¹^ORCID

Affiliation:

1. National Laboratory for Scientific Computing, Petrópolis, Rio de Janeiro, Brazil

2. Computation Institute, Argonne National Laboratory/University of Chicago, Chicago, IL, USA

3. Computer and Systems Engineering Program, COPPE, Federal University of Rio de Janeiro, Rio de Janeiro, Rio de Janeiro, Brazil

4. National Center for Supercomputing Applications, University of Illinois, Urbana, IL, USA

5. Federal University of Juiz de Fora, Juiz de Fora, Minas Gerais, Brazil

Abstract

Advances in sequencing techniques have led to exponential growth in biological data, demanding the development of large-scale bioinformatics experiments. Because these experiments are computation- and data-intensive, they require high-performance computing techniques and can benefit from specialized technologies such as Scientific Workflow Management Systems and databases. In this work, we present BioWorkbench, a framework for managing and analyzing bioinformatics experiments. This framework automatically collects provenance data, including both performance data from workflow execution and data from the scientific domain of the workflow application. Provenance data can be analyzed through a web application that abstracts a set of queries to the provenance database, simplifying access to provenance information. We evaluate BioWorkbench using three case studies: SwiftPhylo, a phylogenetic tree assembly workflow; SwiftGECKO, a comparative genomics workflow; and RASflow, a RASopathy analysis workflow. We analyze each workflow from both computational and scientific domain perspectives, by using queries to a provenance and annotation database. Some of these queries are available as a pre-built feature of the BioWorkbench web application. Through the provenance data, we show that the framework is scalable and achieves high-performance, reducing up to 98% of the case studies execution time. We also show how the application of machine learning techniques can enrich the analysis process.

Funder

Brazilian funding agencies CNPq, CAPES, and FAPERJ

Publisher

PeerJ

Subject

General Agricultural and Biological Sciences,General Biochemistry, Genetics and Molecular Biology,General Medicine,General Neuroscience

Link

https://peerj.com/articles/5551.pdf

Reference54 articles.

1. Comparative genomics as a tool to understand evolution and disease;Alföldi;Genome Research,2013

2. Challenges and approaches for distributed workflow-driven analysis of large-scale biological data: vision paper;Altintas,2012

3. The process of structure-based drug design;Anderson;Chemistry and Biology,2003

4. A survey of cross-validation procedures for model selection;Arlot;Statistics Surveys,2010

5. Comparative analysis of classification algorithms on different datasets using weka;Arora;International Journal of Computer Applications,2012

Cited by 14 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Provenance Information for Biomedical Data and Workflows: Scoping Review;Journal of Medical Internet Research;2024-08-23

2. A Systematic Review of Multi-Objective Evolutionary Algorithms Optimization Frameworks;Processes;2024-04-26

3. Traceable Research Data Sharing in a German Medical Data Integration Center With FAIR (Findability, Accessibility, Interoperability, and Reusability)-Geared Provenance Implementation: Proof-of-Concept Study;JMIR Formative Research;2023-12-07

4. Provenance Information for Biomedical Data and Workflows: Scoping Review (Preprint);2023-07-27

5. Capturing provenance information for biomedical data and workflows: A scoping review;2023-02-09