Affiliation:
1. Faculty of Mathematics and Computer Science, Jagiellonian University
2. Informatica Polska
Abstract
Maintaining data warehouses and ETL processes is becoming increasingly difficult. For this reason, we introduce a similarity measure on ETL processes, based on the edit distance of a graph, which models the process. We show both the exact way how to calculate it and heuristic approaches to compute the estimated similarity more quickly. We propose methods to improve graph edit distance based on the assumption that the ETL process model is a directed acyclic graph.
Publisher
Uniwersytet Jagiellonski - Wydawnictwo Uniwersytetu Jagiellonskiego
Reference45 articles.
1. [1] Alexander Albrecht and Felix Naumann. Managing ETL Processes. In International Workshop on New Trends in Information Integration, 2008.
2. [2] Alexander Albrecht and Felix Naumann. Systematic ETL management - Ex- periences with high-level operators. In MIT International Conference on Infor- mation Quality, 2013.
3. [3] Michael Becker and Ralf Laue. A comparative survey of business process similarity measures. Comput. Ind., 63:148-167, 2012.
4. [4] Neepa Biswas, Samiran Chattapadhyay, Gautam Mahapatra, Santanu Chatterjee, and Kartick Chandra Mondal. A New Approach for Conceptual Extraction- Transformation-Loading Process Modeling. Int. J. Ambient Comput. Intell., 10:30-45, 2019.
5. [5] David B. Blumenthal. New Techniques for Graph Edit Distance Computation. ArXiv, abs/1908.00265, 2019.