Reusable tutorials for using cloud-based computing environments for the analysis of bacterial gene expression data from bulk RNA sequencing

Author:

Allers Steven1,O’Connell Kyle A23,Carlson Thad23,Belardo David4,King Benjamin L156ORCID

Affiliation:

1. University of Maine Department of Molecular and Biomedical Sciences, , 5735 Hitchner Hall, Orono, ME 04469, United States

2. Center for Information Technology, National Institutes of Health , 6555 Rock Spring Dr, Bethesda, MD 20817, United States

3. Health Data and AI, Deloitte Consulting LLP , 1919 N. Lynn St, Arlington, VA 22203, United States

4. Google Cloud , Google, 1900 Reston Metro Plaza, Reston, VA 20190, United States

5. Maine Institutional Development Award Network of Biomedical Research Excellence (INBRE) Data Science Core , MDI Biological Laboratory, 159 Old Bar Harbor Rd, Bar Harbor, ME 04609, United States

6. University of Maine Graduate School of Biomedical Science and Engineering, , 5775 Stodder Hall, Orono, ME 04469, United States

Abstract

Abstract This manuscript describes the development of a resource module that is part of a learning platform named “NIGMS Sandbox for Cloud-based Learning” https://github.com/NIGMS/NIGMS-Sandbox. The overall genesis of the Sandbox is described in the editorial NIGMS Sandbox at the beginning of this Supplement. This module delivers learning materials on RNA sequencing (RNAseq) data analysis in an interactive format that uses appropriate cloud resources for data access and analyses. Biomedical research is increasingly data-driven, and dependent upon data management and analysis methods that facilitate rigorous, robust, and reproducible research. Cloud-based computing resources provide opportunities to broaden the application of bioinformatics and data science in research. Two obstacles for researchers, particularly those at small institutions, are: (i) access to bioinformatics analysis environments tailored to their research; and (ii) training in how to use Cloud-based computing resources. We developed five reusable tutorials for bulk RNAseq data analysis to address these obstacles. Using Jupyter notebooks run on the Google Cloud Platform, the tutorials guide the user through a workflow featuring an RNAseq dataset from a study of prophage altered drug resistance in Mycobacterium chelonae. The first tutorial uses a subset of the data so users can learn analysis steps rapidly, and the second uses the entire dataset. Next, a tutorial demonstrates how to analyze the read count data to generate lists of differentially expressed genes using R/DESeq2. Additional tutorials generate read counts using the Snakemake workflow manager and Nextflow with Google Batch. All tutorials are open-source and can be used as templates for other analysis.

Funder

National Institute of General Medical Sciences of the National Institutes of Health to the Maine INBRE Program

Publisher

Oxford University Press (OUP)

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3