Transcriptomics and epigenetic data integration learning module on Google Cloud-Reference-Cited by-同舟云学术

Transcriptomics and epigenetic data integration learning module on Google Cloud

Published:2024-07 Issue:Supplement_1 Volume:25 Page:
ISSN:1467-5463
Container-title:Briefings in Bioinformatics
language:en
Short-container-title:

Author:

Ruprecht Nathan A¹,Kennedy Joshua D¹²,Bansal Benu¹,Singhal Sonalika³,Sens Donald³,Maggio Angela⁴,Doe Valena⁵,Hawkins Dale⁵,Campbel Ross⁶,O’Connell Kyle⁶,Gill Jappreet Singh¹,Schaefer Kalli¹,Singhal Sandeep K¹³^ORCID

Affiliation:

1. University of North Dakota Department of Biomedical Engineering, , 501 N. Columbia Road Stop 8380, Grand Forks, ND 58202, United States

2. Drury University Department of Chemistry and Physics, , 900 N. Benton Avenue, Springfield, MO 65802, United States

3. University of North Dakota Department of Pathology, , 1301 N. Columbia Road Stop 9037, Grand Forks, ND 58202, United States

4. Deloitte, Health Data and AI, Deloitte Consulting LLP , 1919 N. Lynn Street, Suite 1500, Arlington, VA 22209, United States

5. Google, Google Cloud , 1900 Reston Metro Plaza, Reston, VA 20190, United States

6. NIH Center for Information Technology (CIT) , 6555 Rock Spring Drive, Bethesda, MD 20892, United States

Abstract

Abstract Multi-omics (genomics, transcriptomics, epigenomics, proteomics, metabolomics, etc.) research approaches are vital for understanding the hierarchical complexity of human biology and have proven to be extremely valuable in cancer research and precision medicine. Emerging scientific advances in recent years have made high-throughput genome-wide sequencing a central focus in molecular research by allowing for the collective analysis of various kinds of molecular biological data from different types of specimens in a single tissue or even at the level of a single cell. Additionally, with the help of improved computational resources and data mining, researchers are able to integrate data from different multi-omics regimes to identify new prognostic, diagnostic, or predictive biomarkers, uncover novel therapeutic targets, and develop more personalized treatment protocols for patients. For the research community to parse the scientifically and clinically meaningful information out of all the biological data being generated each day more efficiently with less wasted resources, being familiar with and comfortable using advanced analytical tools, such as Google Cloud Platform becomes imperative. This project is an interdisciplinary, cross-organizational effort to provide a guided learning module for integrating transcriptomics and epigenetics data analysis protocols into a comprehensive analysis pipeline for users to implement in their own work, utilizing the cloud computing infrastructure on Google Cloud. The learning module consists of three submodules that guide the user through tutorial examples that illustrate the analysis of RNA-sequence and Reduced-Representation Bisulfite Sequencing data. The examples are in the form of breast cancer case studies, and the data sets were procured from the public repository Gene Expression Omnibus. The first submodule is devoted to transcriptomics analysis with the RNA sequencing data, the second submodule focuses on epigenetics analysis using the DNA methylation data, and the third submodule integrates the two methods for a deeper biological understanding. The modules begin with data collection and preprocessing, with further downstream analysis performed in a Vertex AI Jupyter notebook instance with an R kernel. Analysis results are returned to Google Cloud buckets for storage and visualization, removing the computational strain from local resources. The final product is a start-to-finish tutorial for the researchers with limited experience in multi-omics to integrate transcriptomics and epigenetics data analysis into a comprehensive pipeline to perform their own biological research. This manuscript describes the development of a resource module that is part of a learning platform named ``NIGMS Sandbox for Cloud-based Learning'' https://github.com/NIGMS/NIGMS-Sandbox. The overall genesis of the Sandbox is described in the editorial NIGMS Sandbox [16] at the beginning of this Supplement. This module delivers learning materials on the analysis of bulk and single-cell ATAC-seq data in an interactive format that uses appropriate cloud resources for data access and analyses. Highlights

Funder

National Institute of General Medical Sciences of the National Institutes of Health

Publisher

Oxford University Press (OUP)

Link

https://academic.oup.com/bib/article-pdf/25/Supplement_1/bbae352/58732181/bbae352.pdf

Reference61 articles.

1. Multi-omics data integration considerations and study design for biological systems and disease;Graw;Mol Omics,2021

2. Multi-omics approaches for biomarker discovery in early ovarian cancer diagnosis;Xiao;EBioMedicine,2022

3. Multi-omics analyses identify molecular signatures with prognostic values in different heart failure aetiologies;Aboumsallem;J Mol Cell Cardiol,2023

4. Single-cell multi-omics advances in lymphoma research;Jin;Oncol Rep,2023

5. Location-specific signatures of Crohn’s disease at a multi-omics scale;Gonzalez;Microbiome,2022