Affiliation:
1. Leibniz Universität Hannover, Hannover, Germany
2. TU Berlin, Berlin, Germany
Abstract
A core operation in data discovery is to find joinable tables for a given table. Real-world tables include both unary and n-ary join keys. However, existing table discovery systems are optimized for unary joins and are ineffective and slow in the existence of n-ary keys. In this paper, we introduce Mate, a table discovery system that leverages a novel hash-based index that enables n-ary join discovery through a space-efficient super key. We design a filtering layer that uses a novel hash, Xash. This hash function encodes the syntactic features of all column values and aggregates them into a super key, which allows the system to efficiently prune tables with non-joinable rows. Our join discovery system is able to prune up to 1000
x
more false positives and leads to over 60
x
faster table discovery in comparison to state-of-the-art.
Publisher
Association for Computing Machinery (ACM)
Subject
General Earth and Planetary Sciences,Water Science and Technology,Geography, Planning and Development
Reference48 articles.
1. Ziawasch Abedjan , John Morcos , Michael N Gubanov , Ihab F Ilyas , Michael Stonebraker , Paolo Papotti , and Mourad Ouzzani . 2015 . Dataxformer: Leveraging the Web for Semantic Transformations.. In CIDR. Ziawasch Abedjan, John Morcos, Michael N Gubanov, Ihab F Ilyas, Michael Stonebraker, Paolo Papotti, and Mourad Ouzzani. 2015. Dataxformer: Leveraging the Web for Semantic Transformations.. In CIDR.
2. Multidimensional binary search trees used for associative searching
3. Dataset Discovery in Data Lakes
4. Data integration for the relational web
5. Similarity estimation techniques from rounding algorithms
Cited by
9 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献