Double-Anonymous Sketch: Achieving Top-K-fairness for Finding Global Top-K Frequent Items

Author:

Zhao Yikai1ORCID,Han Wenchen1ORCID,Zhong Zheng1ORCID,Zhang Yinda1ORCID,Yang Tong1ORCID,Cui Bin1ORCID

Affiliation:

1. Peking University, Beijing, China

Abstract

Finding top-K frequent items has been a hot topic in data stream processing in recent years, which has a wide range of applications. However, most of existing sketch algorithms focuses on finding local top-K in a single data stream. In this paper, we work on finding global top-K in multiple disjoint data streams. We find that directly deploying prior sketch algorithms is often unfair under global scenarios, which will degrade the accuracy of global top-K. We define top-K-fairness and show that it is important for finding global top-K. To achieve top-K-fairness, we propose a new sketch framework, called the Double-Anonymous sketch. The process of finding global top-K items is similar to that of paper reviewing and democratic elections. In these scenarios, double-anonymity is often an effective strategy to achieve top-K-fairness. We also propose two techniques, hot panning, and early freezing, to further improve the accuracy. We theoretically prove that the Double-Anonymous sketch achieves top-K-fairnesswhile keeping high accuracy. We perform extensive experiments to verify top-K-fairness in the scenario of disjoint data streams. The experimental results show that the Double-Anonymous sketch's error is up to 129 times (60 times on average) smaller than the state-of-the-art. All the related source code is open-sourced and available at Github.

Funder

Key-Area Research and Development Program of Guangdong Province

National Natural Science Foundation of China

Publisher

Association for Computing Machinery (ACM)

Reference57 articles.

1. 2004. Real-Life Transactional Dataset. http://fimi.ua.ac.be/data/. 2004. Real-Life Transactional Dataset. http://fimi.ua.ac.be/data/.

2. 2016. Anonymized Internet Traces 2016 . https://catalog.caida.org/dataset/passive_2016_pcap. 2016. Anonymized Internet Traces 2016. https://catalog.caida.org/dataset/passive_2016_pcap.

3. 2023. Source code related to Double-Anonymous sketch. https://github.com/Arimase97/Double-Anonymous-Sketch. 2023. Source code related to Double-Anonymous sketch. https://github.com/Arimase97/Double-Anonymous-Sketch.

4. K. Balachander S. Subhabrata Z. Yin and C. Yan. 2003. Sketch-based change detection: methods evaluation and applications. In SIGCOMM. K. Balachander S. Subhabrata Z. Yin and C. Yan. 2003. Sketch-based change detection: methods evaluation and applications. In SIGCOMM.

5. Ran Ben-Basat Gil Einziger Roy Friedman and etal. 2017. Randomized admission policy for efficient top-k and frequency estimation. In INFOCOM. Ran Ben-Basat Gil Einziger Roy Friedman and etal. 2017. Randomized admission policy for efficient top-k and frequency estimation. In INFOCOM.

Cited by 3 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. WavingSketch: an unbiased and generic sketch for finding top-k items in data streams;The VLDB Journal;2024-07-29

2. Online Detection of Outstanding Quantiles with QuantileFilter;2024 IEEE 40th International Conference on Data Engineering (ICDE);2024-05-13

3. Identifying Performance Bottleneck in Shared In-Network Aggregation during Distributed Training;2023 IEEE 29th International Conference on Parallel and Distributed Systems (ICPADS);2023-12-17

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3