DISCONA: distributed sample compression for nearest neighbor algorithm-Reference-Cited by-同舟云学术

DISCONA: distributed sample compression for nearest neighbor algorithm

Published:2023-03-23 Issue:17 Volume:53 Page:19976-19989
ISSN:0924-669X
Container-title:Applied Intelligence
language:en
Short-container-title:Appl Intell

Author:

Rybicki Jedrzej,Frenklach Tatiana,Puzis Rami

Abstract

AbstractSample compression using 𝜖-net effectively reduces the number of labeled instances required for accurate classification with nearest neighbor algorithms. However, one-shot construction of an 𝜖-net can be extremely challenging in large-scale distributed data sets. We explore two approaches for distributed sample compression: one where local 𝜖-net is constructed for each data partition and then merged during an aggregation phase, and one where a single backbone of an 𝜖-net is constructed from one partition and aggregates target label distributions from other partitions. Both approaches are applied to the problem of malware detection in a complex, real-world data set of Android apps using the nearest neighbor algorithm. Examination of the compression rate, computational efficiency, and predictive power shows that a single backbone of an 𝜖-net attains favorable performance while achieving a compression rate of 99%.

Funder

Helmholtz-Gemeinschaft

Forschungszentrum Jülich GmbH

Publisher

Springer Science and Business Media LLC

Subject

Artificial Intelligence

Link

https://link.springer.com/content/pdf/10.1007/s10489-023-04482-y.pdf

Reference42 articles.

1. Allix K, Bissyandé TF, Klein J et al (2016) AndroZoo: collecting millions of Android apps for the research community. In: Proceedings of the 13th international conference on mining software repositories (MSR’16). ACM, New York, pp 468–471

2. Angiulli F (2005) Fast condensed nearest neighbor rule. In: Proceedings 22nd International Conference on Machine Learning (ICML’05). https://doi.org/10.1145/1102351.1102355. Association for Computing Machinery, New York, pp 25–32

3. AppBrain (2022) Android and Google Play statistics. https://www.appbrain.com/stats, last Accessed: 28 Apr 2022

4. Arp D, Spreitzenbarth M, Hübner M et al (2014) DREBIN: effective and explainable detection of android malware in your pocket. In: Symposium on network and distributed system security (NDSS). https://doi.org/10.14722/ndss.2014.23247. San Diego, Internet Society, pp 1–15

5. Berend D, Kontorovich A (2015) A finite sample analysis of the naive Bayes classifier. J Mach Learn Res 16(44):1519–1545