Getting Rid of Data

Author:

Milo Tova1

Affiliation:

1. School of Computer Science, Tel Aviv University, Tel Aviv, Israel

Abstract

We are experiencing an amazing data-centered revolution. Incredible amounts of data are collected, integrated, and analyzed, leading to key breakthroughs in science and society. This well of knowledge, however, is at a great risk if we do not dispense with some of the data flood. First, the amount of generated data grows exponentially and already at 2020 is expected to be more than twice the available storage. Second, even disregarding storage constraints, uncontrolled data retention risks privacy and security, as recognized, e.g., by the recent EU Data Protection reform. Data disposal policies must be developed to benefit and protect organizations and individuals. Retaining the knowledge hidden in the data while respecting storage, processing, and regulatory constraints is a great challenge. The difficulty stems from the distinct, intricate requirements entailed by each type of constraint, the scale and velocity of data, and the constantly evolving needs. While multiple data sketching, summarization, and deletion techniques were developed to address specific aspects of the problem, we are still very far from a comprehensive solution. Every organization has to battle the same tough challenges with ad hoc solutions that are application-specific and rarely sharable. In this article, we will discuss the logical, algorithmic, and methodological foundations required for the systematic disposal of large-scale data, for constraints enforcement and for the development of applications over the retained information. In particular, we will overview relevant related work, highlighting new research challenges and potential reuse of existing techniques.

Publisher

Association for Computing Machinery (ACM)

Subject

Information Systems and Management,Information Systems

Reference28 articles.

1. Putting lipstick on pig

2. Crowd mining

3. M. Besta and T. Hoefler. 2018. Survey and taxonomy of lossless graph compression and space-efficient graph representations. Retrieved from CoRR abs/1806.01799 (2018). M. Besta and T. Hoefler. 2018. Survey and taxonomy of lossless graph compression and space-efficient graph representations. Retrieved from CoRR abs/1806.01799 (2018).

4. A. Calì D. Calvanese and M. Lenzerini. 2013. Data integration under integrity constraints. In Seminal Contributions to Information Systems Engineering 25 Years of CAiSE. 335--352. A. Calì D. Calvanese and M. Lenzerini. 2013. Data integration under integrity constraints. In Seminal Contributions to Information Systems Engineering 25 Years of CAiSE. 335--352.

Cited by 4 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Toward a Life Cycle Assessment for the Carbon Footprint of Data;Proceedings of the 2nd Workshop on Sustainable Computer Systems;2023-07-09

2. Credit distribution in relational scientific databases;Information Systems;2022-11

3. PHOcus;Proceedings of the VLDB Endowment;2022-08

4. Silver Celebration of Open Data;Vestnik NSUEM;2020-07-06

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3