Affiliation:
1. Peking University, Beijing, China
Abstract
Finding top-K frequent items has been a hot topic in data stream processing in recent years, which has a wide range of applications. However, most of existing sketch algorithms focuses on finding local top-K in a single data stream. In this paper, we work on finding global top-K in multiple disjoint data streams. We find that directly deploying prior sketch algorithms is often unfair under global scenarios, which will degrade the accuracy of global top-K. We define top-K-fairness and show that it is important for finding global top-K. To achieve top-K-fairness, we propose a new sketch framework, called the Double-Anonymous sketch. The process of finding global top-K items is similar to that of paper reviewing and democratic elections. In these scenarios, double-anonymity is often an effective strategy to achieve top-K-fairness. We also propose two techniques, hot panning, and early freezing, to further improve the accuracy. We theoretically prove that the Double-Anonymous sketch achieves top-K-fairnesswhile keeping high accuracy. We perform extensive experiments to verify top-K-fairness in the scenario of disjoint data streams. The experimental results show that the Double-Anonymous sketch's error is up to 129 times (60 times on average) smaller than the state-of-the-art. All the related source code is open-sourced and available at Github.
Funder
Key-Area Research and Development Program of Guangdong Province
National Natural Science Foundation of China
Publisher
Association for Computing Machinery (ACM)
Reference57 articles.
1. 2004. Real-Life Transactional Dataset. http://fimi.ua.ac.be/data/. 2004. Real-Life Transactional Dataset. http://fimi.ua.ac.be/data/.
2. 2016. Anonymized Internet Traces 2016 . https://catalog.caida.org/dataset/passive_2016_pcap. 2016. Anonymized Internet Traces 2016. https://catalog.caida.org/dataset/passive_2016_pcap.
3. 2023. Source code related to Double-Anonymous sketch. https://github.com/Arimase97/Double-Anonymous-Sketch. 2023. Source code related to Double-Anonymous sketch. https://github.com/Arimase97/Double-Anonymous-Sketch.
4. K. Balachander S. Subhabrata Z. Yin and C. Yan. 2003. Sketch-based change detection: methods evaluation and applications. In SIGCOMM. K. Balachander S. Subhabrata Z. Yin and C. Yan. 2003. Sketch-based change detection: methods evaluation and applications. In SIGCOMM.
5. Ran Ben-Basat Gil Einziger Roy Friedman and etal. 2017. Randomized admission policy for efficient top-k and frequency estimation. In INFOCOM. Ran Ben-Basat Gil Einziger Roy Friedman and etal. 2017. Randomized admission policy for efficient top-k and frequency estimation. In INFOCOM.
Cited by
3 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献