A Nonparametric Subspace Analysis Approach with Application to Anomaly Detection Ensembles

Author:

Ben-Gal Irad1,Bacher Marcelo1,Amara Morris1,Shmueli Erez1ORCID

Affiliation:

1. Department of Industrial Engineering, Tel Aviv University, 69978 Tel Aviv, Israel

Abstract

Identifying anomalies in multidimensional data sets is an important yet challenging task in many real-world applications. A special case arises when anomalies are occluded in a small subset of attributes. We propose a new subspace analysis approach, called agglomerative attribute grouping (AAG), that searches for subspaces composed of highly correlative (in the general sense) attributes. Such correlations among attributes can better reflect the behavior of normal observations and hence, can be used to improve the identification of abnormal data samples. The proposed AAG algorithm relies on a generalized multiattribute measure (derived from information theory measures over attributes’ partitions) for evaluating the “information distance” among various subsets of attributes. To determine the set of subspaces, AAG applies a variation of the well-known agglomerative clustering algorithm with the proposed measure as the underlying distance function, whereas in contrast to existing methods, AAG does not require any tuning of parameters. Finally, the set of informative subspaces can be used to improve subspace-based analytical tasks, such as anomaly detection, novelty detection, forecasting, and clustering. Extensive evaluation over real-world data sets demonstrates that (i) in the vast majority of cases, AAG outperforms both classical and state-of-the-art subspace analysis methods when used in anomaly and novelty detection ensembles; (ii) it often generates fewer subspaces with fewer attributes each, thus resulting in faster training times for the anomaly and novelty detection ensemble; and (iii) the generated subspaces can also be useful in other analytical tasks, such as clustering and forecasting. History: Kwok-Leung Tsui served as the senior editor for this article. Funding: This research was partially supported by the Israeli Ministry of Economy (METRO 450 Consortium within the frame of MAGNET program) as well as by the Koret foundation grant for Smart Cities and Digital Living 2030. Data Ethics & Reproducibility Note: The code capsule is available on Code Ocean at https://codeocean.com/capsule/2526218/tree/v1 and in the e-Companion to this article (available at https://doi.org/10.1287/ijds.2023.0027 ).

Publisher

Institute for Operations Research and the Management Sciences (INFORMS)

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3