Current approaches for executing big data science projects—a systematic literature review

Author:

Saltz Jeffrey S.1,Krasteva Iva2

Affiliation:

1. Syracuse University, Syracuse, NY, United States of America

2. GATE Institute, Sofia University, Sofia, Bulgaria

Abstract

There is an increasing number of big data science projects aiming to create value for organizations by improving decision making, streamlining costs or enhancing business processes. However, many of these projects fail to deliver the expected value. It has been observed that a key reason many data science projects don’t succeed is not technical in nature, but rather, the process aspect of the project. The lack of established and mature methodologies for executing data science projects has been frequently noted as a reason for these project failures. To help move the field forward, this study presents a systematic review of research focused on the adoption of big data science process frameworks. The goal of the review was to identify (1) the key themes, with respect to current research on how teams execute data science projects, (2) the most common approaches regarding how data science projects are organized, managed and coordinated, (3) the activities involved in a data science projects life cycle, and (4) the implications for future research in this field. In short, the review identified 68 primary studies thematically classified in six categories. Two of the themes (workflow and agility) accounted for approximately 80% of the identified studies. The findings regarding workflow approaches consist mainly of adaptations to CRISP-DM (vs entirely new proposed methodologies). With respect to agile approaches, most of the studies only explored the conceptual benefits of using an agile approach in a data science project (vs actually evaluating an agile framework being used in a data science context). Hence, one finding from this research is that future research should explore how to best achieve the theorized benefits of agility. Another finding is the need to explore how to efficiently combine workflow and agile frameworks within a data science context to achieve a more comprehensive approach for project execution.

Publisher

PeerJ

Subject

General Computer Science

Reference92 articles.

1. A Lean Design Thinking Methodology (LDTM) for Machine Learning and Modern Data Projects;Ahmed,2019

2. Demystifying data science projects: a look on the people and process of data;Aho,2020

3. Applying software engineering processes for big data analytics applications development;Al-Jaroodi,2017

4. Software engineering for machine learning: a case study;Amershi,2019

5. Towards an improved ASUM-DM process methodology for data & analytics projects;Angée;International Conference on Knowledge Management in Organizations,2018

Cited by 12 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Time analysis of online consumer behavior by decision trees, GUHA association rules, and formal concept analysis;Journal of Marketing Analytics;2024-01-09

2. MLOps in Data Science Projects: A Review;2023 IEEE International Conference on Big Data (BigData);2023-12-15

3. Data Science Failure: A Literature Review;2023 IEEE International Conference on Big Data (BigData);2023-12-15

4. The effect of big data technologies usage on social competence;PeerJ Computer Science;2023-11-17

5. Late Fusion Approach for Multimodal Emotion Recognition Based on Convolutional and Graph Neural Networks;Proceedings of the 31st International Conference on Information Systems Development;2023-10-05

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3