An accessible infrastructure for artificial intelligence using a Docker-based JupyterLab in Galaxy-Reference-Cited by-同舟云学术

An accessible infrastructure for artificial intelligence using a Docker-based JupyterLab in Galaxy

Published:2022-12-28 Issue: Volume:12 Page:
ISSN:2047-217X
Container-title:GigaScience
language:en
Short-container-title:

Author:

Kumar Anup¹^ORCID,Cuccuru Gianmauro¹^ORCID,Grüning Björn¹^ORCID,Backofen Rolf¹²^ORCID

Affiliation:

1. Bioinformatics Group, Department of Computer Science, University of Freiburg, Georges-Koehler-Allee 106 , 79110 Freiburg , Germany

2. Signalling Research Centres BIOSS and CIBSS, University of Freiburg , Schaenzlestr. 18, 79104 Freiburg , Germany

Abstract

Abstract Background Artificial intelligence (AI) programs that train on large datasets require powerful compute infrastructure consisting of several CPU cores and GPUs. JupyterLab provides an excellent framework for developing AI programs, but it needs to be hosted on such an infrastructure to enable faster training of AI programs using parallel computing. Findings An open-source, docker-based, and GPU-enabled JupyterLab infrastructure is developed that runs on the public compute infrastructure of Galaxy Europe consisting of thousands of CPU cores, many GPUs, and several petabytes of storage to rapidly prototype and develop end-to-end AI projects. Using a JupyterLab notebook, long-running AI model training programs can also be executed remotely to create trained models, represented in open neural network exchange (ONNX) format, and other output datasets in Galaxy. Other features include Git integration for version control, the option of creating and executing pipelines of notebooks, and multiple dashboards and packages for monitoring compute resources and visualization, respectively. Conclusions These features make JupyterLab in Galaxy Europe highly suitable for creating and managing AI projects. A recent scientific publication that predicts infected regions in COVID-19 computed tomography scan images is reproduced using various features of JupyterLab on Galaxy Europe. In addition, ColabFold, a faster implementation of AlphaFold2, is accessed in JupyterLab to predict the 3-dimensional structure of protein sequences. JupyterLab is accessible in 2 ways—one as an interactive Galaxy tool and the other by running the underlying Docker container. In both ways, long-running training can be executed on Galaxy’s compute infrastructure. Scripts to create the Docker container are available under MIT license at https://github.com/usegalaxy-eu/gpu-jupyterlab-docker.

Funder

DFG

Bundesministerium für Bildung und Frauen

Publisher

Oxford University Press (OUP)

Subject

Computer Science Applications,Health Informatics

Link

https://academic.oup.com/gigascience/article-pdf/doi/10.1093/gigascience/giad028/50099539/giad028.pdf

Reference59 articles.

1. The FASTA package—protein and DNA sequence similarity searching and alignment programs;Pearson,2016

2. Machine learning in bioinformatics

3. The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2022 update;The Galaxy Community;Nucleic Acids Res,2022

4. Container for machine learning and deep learning in Jupyter notebook;Kumar,2021