On the Feasibility of Forgetting in Data Streams

Author:

Pavan A.1ORCID,Chakraborty Sourav2ORCID,Vinodchandran N. V.3ORCID,Meel Kuldeep S4ORCID

Affiliation:

1. Iowa State University, Ames, IA, USA

2. Indian Statistical Institute, Kolkata, WB, India

3. University of Nebraska-Lincoln, Lincoln, NE, USA

4. University of Toronto, Toronto, ON, Canada

Abstract

In today's digital age, it is becoming increasingly prevalent to retain digital footprints in the cloud indefinitely. Nonetheless, there is a valid argument that entities should have the authority to decide whether their personal data remains within a specific database or is expunged. Indeed, nations across the globe are increasingly enacting legislation to uphold the "Right To Be Forgotten" for individuals. Investigating computational challenges, including the formalization and implementation of this notion, is crucial due to its relevance in the domains of data privacy and management. This work introduces a new streaming model: the 'Right to be Forgotten Data Streaming Model' (RFDS model). The main feature of this model is that any element in the stream has the right to have its history removed from the stream. Formally, the input is a stream of updates of the form (a, Δ) where Δ ∈ {+, ⊥} and a is an element from a universe U. When the update Δ=+ occurs, the frequency of a, denoted as f a , is incremented to f a +1. When the update Δ=⊥, occurs, f a is set to 0. This feature, which represents the forget request, distinguishes the present model from existing data streaming models. This work systematically investigates computational challenges that arise while incorporating the notion of the right to be forgotten. Our initial considerations reveal that even estimating F 1 (sum of the frequencies of elements) of the stream is a non-trivial problem in this model. Based on the initial investigations, we focus on a modified model which we call α-RFDS where we limit the number of forget operations to be at most α fraction. In this modified model, we focus on estimating F 0 (number of distinct elements) and F 1 . We present algorithms and establish almost-matching lower bounds on the space complexity for these computational tasks.

Funder

NSF

National Research Foundation Singapore

Ministry of Education Singapore

Publisher

Association for Computing Machinery (ACM)

Reference17 articles.

1. 2017. Theworld's most valuable resource is no longer oil but data. https://www.economist.com/leaders/2017/05/06/theworlds-most-valuable-resource-is-no-longer-oil-but-data

2. The space complexity of approximating the frequency moments

3. Ziv Bar-Yossef, T. S. Jayram, Ravi Kumar, D. Sivakumar, and Luca Trevisan. 2022. Counting Distinct Elements in a Data Stream. In Randomization and Approximation Techniques, 6th International Workshop, RANDOM 2002, Cambridge, MA, USA, September 13--15, 2002, Proceedings (Lecture Notes in Computer Science, Vol. 2483), José D. P. Rolim and Salil P. VadMuth (Eds.). 1--10.

4. J Lawrence Carter and Mark N Wegman. 1977. Universal classes of hash functions. In Proceedings of the ninth annual ACM symposium on Theory of computing. ACM, 106--112.

5. Amit Chakrabarti. 2023. Data Stream Algorithms Lecture Notes. https://www.cs.dartmouth.edu/~ac/Teach/datastreams-lecnotes.pdf

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3