Outlier detection for incomplete real-valued data based on inner boundary

Author:

Zhao Zhengwei1,Yang Genteng2,Li Zhaowen3

Affiliation:

1. School of Mathematics and Physics, Guangxi Minzu University, Nanning, Guangxi, P.R. China

2. School of Electronic Information, Guangxi Minzu University, Nanning, Guangxi, P.R. China

3. Key Laboratory of Complex System Optimization and Big Data Processing in Department of Guangxi Education, Yulin Normal University, Yulin, Guangxi, P.R. China

Abstract

Outlier detection is a process to find out the objects that have the abnormal behavior. It can be applied in many aspects, such as public security, finance and medical care. An information system (IS) as a database that shows relationships between objects and attributes. A real-valued information system (RVIS) is an IS whose information values are real numbers. A RVIS with missing values is an incomplete real-valued information system (IRVIS). The notion of inner boundary comes from the boundary region in rough set theory (RST). This paper conducts experiments directly in an IRVIS and investigates outlier detection in an IRVIS based on inner boundary. Firstly, the distance between two information values on each attribute of an IRVIS is introduced, and the parameter λ to control the distance is given. Then, the tolerance relations on the object set are defined according to the distance, by the way, the tolerance classes, the λ-lower and λ-upper approximations in an IRVIS are put forward. Next, the inner boundary under each conditional attribute in an IRVIS is presented. The more inner boundaries an object belongs to, the more likely it is to be an outlier. Finally, an outlier detection method in an IRVIS based on inner boundary is proposed, and the corresponding algorithm (DE) is designed, where DE means degree of exceptionality. Through the experiments base on UCI Machine Learning Repository data sets, the DE algorithm is compared with other five algorithms. Experimental results show that DE algorithm has the better outlier detection effect in an IRVIS. It is worth mentioning that for comprehensive comparison, ROC curve and AUC value are used to illustrate the advantages of the DE algorithm.

Publisher

IOS Press

Subject

Artificial Intelligence,General Engineering,Statistics and Probability

Reference37 articles.

1. Statistical Fraud Detection: A Review Comment Comment Rejoinder;Bolton;Statistical Science,2002

2. A rough-set-basedincremental approach for updating approximations under dynamicmaintenance environments;Chen;IEEE Transactions on Knowledge and Data Engineering,2013

3. Neighborhood outlier detection;Chen;Expert Systems with Applications,2010

4. Attributeselection for partially labeled categorical data by rough setapproach;Dai;IEEE Transactions on Cybernetics,2017

5. An uncertainty measure for incomplete decision tables and its applications;Dai;IEEE Transactions on Cybernetics,2013

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3