Similarity Distribution Density: An Optimized Approach to Outlier Detection

Author:

Quan Li1ORCID,Gong Tao1ORCID,Jiang Kaida1

Affiliation:

1. College of Information Science and Technology, Donghua University, Shanghai 201620, China

Abstract

When dealing with uncertain data, traditional model construction methods often ignore or filter out noise data to improve model performance. However, this simple approach can lead to insufficient data utilization, model bias, reduced detection ability, and decreased robustness of detection models. Outliers can be considered as data that are inconsistent with other patterns at certain specific moments and are not always negative data, so their emergence is not always bad. In the process of data analysis, outliers play a crucial role in sample vector recognition, missing value processing, and model stability verification. In addition, unsupervised models have very high computation costs when recognizing outliers, especially non-parameterized unsupervised models. To solve the above problems, we used semi-supervised learning processes and used similarity as a negative selection criterion to propose a local density verification detection model (Vd-LOD). This model establishes similarity pseudo-labels for multi-label and multi-type samples, verifies the accuracy of outlier values based on local outlier factors, and increases the detector’s sensitivity to outliers. The experimental results show that under different parameter settings with varying outlier quantities, Vd-LOD outperforms other detection models in terms of the significant increase in average time consumption caused by verifying the presence of relationships, while also achieving an approximate 6% improvement in average detection accuracy.

Publisher

MDPI AG

Subject

Electrical and Electronic Engineering,Computer Networks and Communications,Hardware and Architecture,Signal Processing,Control and Systems Engineering

Reference51 articles.

1. Aggarwal, C.C. (2015). Outlier Analysis, Springer. Data Mining.

2. Outlier Detection: Methods, Models, and Classification;Boukerche;ACM Comput. Surv.,2020

3. Günnemann, N., Günnemann, S., and Faloutsos, C. (2014). Robust Multivariate Autoregression for Anomaly Detection in Dynamic Product Ratings, ACM.

4. Ben-Gal, I. (2005). Outlier Detection, Springer.

5. Braei, M., and Wagner, S. (2020). Anomaly Detection in Univariate Time-series: A Survey on the State-of-the-Art. arXiv.

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3