Abstract
AbstractAs a result of the paradigm shift away from rather rigid data warehouses to general-purpose data lakes, fully flexible self-service analytics is made possible. However, this also increases the complexity for domain experts who perform these analyses, since comprehensive data preparation tasks have to be implemented for each data access. For this reason, we developed BARENTS, a toolset that enables domain experts to specify data preparation tasks as ontology rules, which are then applied to the data involved. Although our evaluation of BARENTS showed that it is a valuable contribution to self-service analytics, a major drawback is that domain experts do not receive any semantic support when specifying the rules. In this paper, we therefore address how a recommender approach can provide additional support to domain experts by identifying supplementary datasets that might be relevant for their analyses or additional data processing steps to improve data refinement. This recommender operates on the set of data preparation rules specified in BARENTS—i.e., the accumulated knowledge of all domain experts is factored into the data preparation for each new analysis. Evaluation results indicate that such a recommender approach further contributes to the practicality of BARENTS and thus represents a step towards effective and efficient self-service analytics in data lakes.
Publisher
Springer Science and Business Media LLC
Subject
General Earth and Planetary Sciences,General Environmental Science
Reference17 articles.
1. van der Aalst W (2012) Process mining: overview and opportunities. ACM Trans Manage Inf Syst 3(2):7
2. Alserafi A, Abelló A, Romero O et al (2020) Keeping the data lake in form: proximity mining for pre-filtering schema matching. ACM Trans Inf Syst 38(3):26
3. Behringer M, Hirmer P, Fritz M et al (2020) Empowering domain experts to preprocess massive distributed datasets. In: BIS’20, pp 61–75
4. Brazdil P, van Rijn JN, Soares C et al (2022) Automating data science. In: Metalearning: applications to automated machine learning and data mining. Springer, Cham, pp 269–282
5. Diamantini C, Lo Giudice P, Potena D et al (2021) An approach to extracting topic-guided views from the sources of a data lake. Inform Syst Front 23:243–262
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献