Outlier mining in high-dimensional data using the Jensen–Shannon divergence and graph structure analysis-Reference-Cited by-同舟云学术

Outlier mining in high-dimensional data using the Jensen–Shannon divergence and graph structure analysis

Published:2022-12-01 Issue:4 Volume:3 Page:045011
ISSN:2632-072X
Container-title:Journal of Physics: Complexity
language:
Short-container-title:J. Phys. Complex.

Author:

Toledo Alex S O^ORCID,Silini Riccardo,Carpi Laura C,Masoller Cristina^ORCID

Abstract

AbstractReliable anomaly/outlier detection algorithms have practical applications in many fields. For instance, anomaly detection allows to filter and clean the data used to train machine learning algorithms, improving their performance. However, outlier mining is challenging when the data is high-dimensional, and different approaches have been proposed for different types of data (temporal, spatial, network, etc). Here we propose a methodology to mine outliers in generic datasets in which it is possible to define a meaningful distance between elements of the dataset. The methodology is based on defining a fully connected, undirected graph, where the nodes are the elements of the dataset and the links have weights that are the distances between the nodes. Outlier scores are defined by analyzing the structure of the graph, in particular, by using the Jensen–Shannon (JS) divergence to compare the distributions of weights of different nodes. We demonstrate the method using a publicly available database of credit-card transactions, where some of the transactions are labeled as frauds. We compare with the performance obtained when using Euclidean distances and graph percolation, and show that the JS divergence leads to performance improvement, but increases the computational cost.

Funder

Ministerio de Ciencia e Innovación

European Commission

Institució Catalana de Recerca i Estudis Avançats

Publisher

IOP Publishing

Subject

Artificial Intelligence,Computer Networks and Communications,Computer Science Applications,Information Systems

Link

https://iopscience.iop.org/article/10.1088/2632-072X/aca94a/pdf

Reference20 articles.

1. Anomaly detection: a survey;Chandola;ACM Comput. Surv.,2009

2. Graph based anomaly detection and description: a survey;Akoglu;Data Min. Knowl. Discov.,2015

3. Progress in outlier detection techniques: a survey;Wang;IEEE Access,2019

4. Outlier detection: methods, models and classification;Boukerche;ACM Comput. Surv.,2020

5. Deep learning for anomaly detection: a review;Pang;ACM Comput. Surv.,2021

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Inline optimization for injection molding processes for abrupt and gradual process behavior alterations;AIP Conference Proceedings;2024

2. Multiplex key roles to disrupt criminal networks;Social Network Analysis and Mining;2023-07-20