Abstract
Domain experts play an important role in data science, as their knowledge can unlock valuable insights from data. As they often lack technical skills required to analyze data, they need collaborations with technical experts. In these joint efforts, productive collaborations are critical not only in the phase of constructing a data science task, but more importantly, during the execution of a task. This need stems from the inherent complexity of data science, which often involves user-defined functions or machine-learning operations. Consequently, collaborators want various interactions during runtime, such as pausing/resuming the execution, inspecting an operator's state, and modifying an operator's logic. To achieve the goal, in the past few years we have been developing an open-source system called Texera to support collaborative data analytics using GUI-based workflows as cloud services. In this paper, we present a holistic view of several important design principles we followed in the design and implementation of the system. We focus on different methods of sending messages to running workers, how these methods are adopted to support various runtime interactions from users, and their trade-offs on both performance and consistency. These principles enable Texera to provide powerful user interactions during a workflow execution to facilitate efficient collaborations in data analytics.
Publisher
Association for Computing Machinery (ACM)
Reference31 articles.
1. 2024. Data Science and Analytics Automation Platform | Alteryx --- alteryx.com. https://www.alteryx.com/.
2. 2024. Deepnote: Analytics and data science notebook for teams. --- deepnote.com. https://deepnote.com/.
3. 2024. Google Colab --- research.google.com. https://research.google.com/colaboratory/faq.html.
4. 2024. Introduction to Databricks notebooks | Databricks on AWS --- docs.databricks.com. https://docs.databricks.com/notebooks/index.html.
5. 2024. Open for Innovation | KNIME --- knime.com. https://www.knime.com/.