Abstract
AbstractBidirectional order dependencies (bODs) capture order relationships between lists of attributes in a relational table. They can express that, for example, sorting books by publication date in ascending order also sorts them by age in descending order. The knowledge about order relationships is useful for many data management tasks, such as query optimization, data cleaning, or consistency checking. Because the bODs of a specific dataset are usually not explicitly given, they need to be discovered. The discovery of all minimal bODs (in set-based canonical form) is a task with exponential complexity in the number of attributes, though, which is why existing bOD discovery algorithms cannot process datasets of practically relevant size in a reasonable time. In this paper, we propose the distributed bOD discovery algorithm DISTOD, whose execution time scales with the available hardware. DISTOD is a scalable, robust, and elastic bOD discovery approach that combines efficient pruning techniques for bOD candidates in set-based canonical form with a novel, reactive, and distributed search strategy. Our evaluation on various datasets shows that DISTOD outperforms both single-threaded and distributed state-of-the-art bOD discovery algorithms by up to orders of magnitude; it can, in particular, process much larger datasets.
Funder
Hasso-Plattner-Institut für Digital Engineering gGmbH
Publisher
Springer Science and Business Media LLC
Subject
Hardware and Architecture,Information Systems
Reference33 articles.
1. Abadi, D., Madden, S., Ferreira, M.: Integrating compression and execution in column-oriented database systems. In: Proceedings of the International Conference on Management of Data (SIGMOD), pp. 671–682 (2006)
2. Abedjan, Z., Golab, L., Naumann, F., Papenbrock, T.: Data Profiling. Morgan & Claypool Publishers. ISBN: 978-1-68173-447-7 (2018)
3. Bayer, R., McCreight, E.: Organization and maintenance of large ordered indices. In: Proceedings of the Workshop on Data Description (SIGFIDET, Now SIGMOD), pp. 107–141 (1970)
4. Bleifuß, T., Kruse, S., Naumann, F.: Efficient denial constraint discovery with hydra. Proc. VLDB Endow. 11(3), 311–323 (2017)
5. Consonni, C., Montresor, A., Sottovia, P., Velegrakis, Y.: Discovering order dependencies through order compatibility. In: Proceedings of the International Conference on Extending Database Technology (EDBT), pp. 409–420 (2019)
Cited by
7 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献