Abstract
A complete description of an entity is rarely contained in a single data source, but rather, it is often distributed across different data sources. Applications based on personal electronic health records, sentiment analysis, and financial records all illustrate that significant value can be derived from integrated, consistent, and queryable profiles of entities from different sources. Even more so, such integrated profiles are considerably enhanced if temporal information from different sources is carefully accounted for.
We develop a simple and yet versatile operator, called prawn, that is typically called as a final step of an entity integration workflow. Prawn is capable of consistently integrating and resolving temporal conflicts in data that may contain multiple dimensions of time based on a set of preference rules specified by a user (hence the name prawn for
preference-aware union
). In the event that not all conflicts can be resolved through preferences, one can enumerate each possible consistent interpretation of the result returned by prawn at a given time point through a polynomial-delay algorithm. In addition to providing algorithms for implementing prawn, we study and establish several desirable properties of prawn. First, prawn produces the same temporally integrated outcome, modulo representation of time, regardless of the order in which data sources are integrated. Second, prawn can be customized to integrate temporal data for different applications by specifying application-specific preference rules. Third, we show experimentally that our implementation of prawn is feasible on both "small" and "big" data platforms in that it is efficient in both storage and execution time. Finally, we demonstrate a fundamental advantage of prawn: we illustrate that standard query languages can be immediately used to pose useful temporal queries over the integrated and resolved entity repository.
Subject
General Earth and Planetary Sciences,Water Science and Technology,Geography, Planning and Development
Cited by
7 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Temporal data exchange;Information Systems;2020-01
2. Computing possible and certain answers over order-incomplete data;Theoretical Computer Science;2019-12
3. Currency Preserving Query: Selecting the Newest Values from Multiple Tables;IEICE Transactions on Information and Systems;2018-12-01
4. Rule Sharing for Fraud Detection via Adaptation;2018 IEEE 34th International Conference on Data Engineering (ICDE);2018-04
5. Preference-driven similarity join;Proceedings of the International Conference on Web Intelligence;2017-08-23