Author:
Tripathi Shailesh,Muhr David,Brunner Manuel,Jodlbauer Herbert,Dehmer Matthias,Emmert-Streib Frank
Abstract
The Cross-Industry Standard Process for Data Mining (CRISP-DM) is a widely accepted framework in production and manufacturing. This data-driven knowledge discovery framework provides an orderly partition of the often complex data mining processes to ensure a practical implementation of data analytics and machine learning models. However, the practical application of robust industry-specific data-driven knowledge discovery models faces multiple data- and model development-related issues. These issues need to be carefully addressed by allowing a flexible, customized and industry-specific knowledge discovery framework. For this reason, extensions of CRISP-DM are needed. In this paper, we provide a detailed review of CRISP-DM and summarize extensions of this model into a novel framework we call Generalized Cross-Industry Standard Process for Data Science (GCRISP-DS). This framework is designed to allow dynamic interactions between different phases to adequately address data- and model-related issues for achieving robustness. Furthermore, it emphasizes also the need for a detailed business understanding and the interdependencies with the developed models and data quality for fulfilling higher business objectives. Overall, such a customizable GCRISP-DS framework provides an enhancement for model improvements and reusability by minimizing robustness-issues.
Reference143 articles.
1. Transposable regularized covariance models with an application to missing data imputation;Allen;Ann. Appl. Stat.,2010
2. Power to the people: the role of humans in interactive machine learning;Amershi;AI. Magazine,2014
3. Big data visualization and analytics: future research challenges and emerging applications
AndrienkoG.
AndrienkoN.
DruckerS.
FeketeJ-D.
FisherD.
IdreosS.
2020
4. Context-aware data quality assessment for big data;Ardagna;Future Generation Comput. Syst.,2018
5. A survey on unsupervised outlier detection in high-dimensional numerical data;Arthur;Stat. Anal. Data Mining,2012
Cited by
25 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献