Affiliation:
1. Univ. of Illinois Urbana-Champaign
2. University of Michigan - Ann Arbor
Abstract
Computational notebooks (e.g., Jupyter, Google Colab) are widely used for interactive data science and machine learning. In those frameworks, users can start a
session
, then execute
cells
(i.e., a set of statements) to create variables, train models, visualize results, etc. Unfortunately, existing notebook systems do not offer live migration: when a notebook launches on a new machine, it loses its
state
, preventing users from continuing their tasks from where they had left off. This is because, unlike DBMS, the sessions directly rely on underlying kernels (e.g., Python/R interpreters) without an additional data management layer. Existing techniques for preserving states, such as copying all variables or OS-level checkpointing, are unreliable (often fail), inefficient, and platform-dependent. Also, re-running code from scratch can be highly time-consuming.
In this paper, we introduce a new notebook system, Elastic-Notebook, that offers live migration via checkpointing/restoration using a novel mechanism that is reliable, efficient, and platform-independent. Specifically, by observing all cell executions via transparent, lightweight monitoring, ElasticNotebook can find a reliable and efficient way (i.e.,
replication plan
) for reconstructing the original session state, considering variable-cell dependencies, observed runtime, variable sizes, etc. To this end, our new graph-based optimization problem finds how to reconstruct all variables (efficiently) from a subset of variables that can be transferred across machines. We show that ElasticNotebook reduces end-to-end migration and restoration times by 85%-98% and 94%-99%, respectively, on a variety (i.e., Kaggle, JWST, and Tutorial) of notebooks with negligible runtime and memory overheads of <2.5% and <10%.
Publisher
Association for Computing Machinery (ACM)
Subject
General Earth and Planetary Sciences,Water Science and Technology,Geography, Planning and Development
Reference120 articles.
1. Reproducible Notebook Containers using Application Virtualization
2. AndresHG. 2021. NLP GloVe BERT TF-IDF LSTM... Explained. www.kaggle.com/code/andreshg/nlp-glove-bert-tf-idf-lstm-explained/notebook. AndresHG. 2021. NLP GloVe BERT TF-IDF LSTM... Explained. www.kaggle.com/code/andreshg/nlp-glove-bert-tf-idf-lstm-explained/notebook.
3. Jason Ansel , Kapil Arya , and Gene Cooperman . 2009 . DMTCP: Transparent Checkpointing for Cluster Computations and the Desktop. In 2009 IEEE International Symposium on Parallel & Distributed Processing (IPDPS'09) . IEEE, IEEE Computer Society, Rome, Italy, 1--12. Jason Ansel, Kapil Arya, and Gene Cooperman. 2009. DMTCP: Transparent Checkpointing for Cluster Computations and the Desktop. In 2009 IEEE International Symposium on Parallel & Distributed Processing (IPDPS'09). IEEE, IEEE Computer Society, Rome, Italy, 1--12.
4. Microsoft Azure. 2023. Azure ML Studio. learn.microsoft.com/en-us/azure/machine-learning/how-to-run-jupyter-notebooks. Microsoft Azure. 2023. Azure ML Studio. learn.microsoft.com/en-us/azure/machine-learning/how-to-run-jupyter-notebooks.
5. Microsoft Azure. 2023. Microsoft Azure pay-as-you-go. azure.microsoft.com/en-us/pricing/purchase-options/pay-as-you-go/. Microsoft Azure. 2023. Microsoft Azure pay-as-you-go. azure.microsoft.com/en-us/pricing/purchase-options/pay-as-you-go/.
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Demonstration of ElasticNotebook: Migrating Live Computational Notebook States;Companion of the 2024 International Conference on Management of Data;2024-06-09