Materialization and Reuse Optimizations for Production Data Science Pipelines

Author:

Derakhshan Behrouz1,Rezaei Mahdiraji Alireza2,Kaoudi Zoi3,Rabl Tilmann4,Markl Volker5

Affiliation:

1. DFKI GmbH, Berlin, Germany

2. Yara Digital Production, Berlin, Germany

3. TU Berlin, Berlin, Germany

4. HPI & University of Potsdam, Potsdam, Germany

5. TU Berlin & DFKI GmbH, Berlin, Germany

Funder

German Ministry for Education and Research as BIFOLD - Berlin Institute for the Foundations of Learning and Data

German Federal Ministry for Economic Affairs and Climate Action, Project ?ExDRa?

Publisher

ACM

Reference59 articles.

1. Jordan T Ash and Ryan P Adams . 2019. On the difficulty of warm-starting neural network training. arXiv preprint arXiv:1910.08475 ( 2019 ). Jordan T Ash and Ryan P Adams. 2019. On the difficulty of warm-starting neural network training. arXiv preprint arXiv:1910.08475 (2019).

2. Pierre Baldi , Peter Sadowski , and Daniel Whiteson . 2014. Searching for exotic particles in high-energy physics with deep learning. Nature communications , Vol. 5 , 1 ( 2014 ), 1--9. Pierre Baldi, Peter Sadowski, and Daniel Whiteson. 2014. Searching for exotic particles in high-energy physics with deep learning. Nature communications, Vol. 5, 1 (2014), 1--9.

3. TFX

4. Principles of dataset versioning

5. Jeffrey Chung . 2020. NYC Taxi Trip - Public. kaggle.com/jeffreycbw/nyc-taxi-trip-public-0--37399-private-0--37206 Retrieved August 23, 2021 from Jeffrey Chung. 2020. NYC Taxi Trip - Public. kaggle.com/jeffreycbw/nyc-taxi-trip-public-0--37399-private-0--37206 Retrieved August 23, 2021 from

Cited by 5 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Reproducible data science over data lakes: replayable data pipelines with Bauplan and Nessie;Proceedings of the Eighth Workshop on Data Management for End-to-End Machine Learning;2024-06-09

2. To Store or Not to Store: a graph theoretical approach for Dataset Versioning;2024 IEEE International Parallel and Distributed Processing Symposium (IPDPS);2024-05-27

3. HYPPO: Using Equivalences to Optimize Pipelines in Exploratory Machine Learning;2024 IEEE 40th International Conference on Data Engineering (ICDE);2024-05-13

4. Optimizing Data Pipelines for Machine Learning in Feature Stores;Proceedings of the VLDB Endowment;2023-09

5. Raven;Proceedings of the Workshop on Human-In-the-Loop Data Analytics;2023-06-18

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3