Preference-aware integration of temporal data

Author:

Alexe Bogdan1,Roth Mary2,Tan Wang-Chiew3

Affiliation:

1. IBM Almaden

2. IBM Almaden and UCSC

3. UCSC

Abstract

A complete description of an entity is rarely contained in a single data source, but rather, it is often distributed across different data sources. Applications based on personal electronic health records, sentiment analysis, and financial records all illustrate that significant value can be derived from integrated, consistent, and queryable profiles of entities from different sources. Even more so, such integrated profiles are considerably enhanced if temporal information from different sources is carefully accounted for. We develop a simple and yet versatile operator, called prawn, that is typically called as a final step of an entity integration workflow. Prawn is capable of consistently integrating and resolving temporal conflicts in data that may contain multiple dimensions of time based on a set of preference rules specified by a user (hence the name prawn for preference-aware union ). In the event that not all conflicts can be resolved through preferences, one can enumerate each possible consistent interpretation of the result returned by prawn at a given time point through a polynomial-delay algorithm. In addition to providing algorithms for implementing prawn, we study and establish several desirable properties of prawn. First, prawn produces the same temporally integrated outcome, modulo representation of time, regardless of the order in which data sources are integrated. Second, prawn can be customized to integrate temporal data for different applications by specifying application-specific preference rules. Third, we show experimentally that our implementation of prawn is feasible on both "small" and "big" data platforms in that it is efficient in both storage and execution time. Finally, we demonstrate a fundamental advantage of prawn: we illustrate that standard query languages can be immediately used to pose useful temporal queries over the integrated and resolved entity repository.

Publisher

VLDB Endowment

Subject

General Earth and Planetary Sciences,Water Science and Technology,Geography, Planning and Development

Cited by 7 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Temporal data exchange;Information Systems;2020-01

2. Computing possible and certain answers over order-incomplete data;Theoretical Computer Science;2019-12

3. Currency Preserving Query: Selecting the Newest Values from Multiple Tables;IEICE Transactions on Information and Systems;2018-12-01

4. Rule Sharing for Fraud Detection via Adaptation;2018 IEEE 34th International Conference on Data Engineering (ICDE);2018-04

5. Preference-driven similarity join;Proceedings of the International Conference on Web Intelligence;2017-08-23

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3