Outlier Summarization via Human Interpretable Rules

Author:

Deng Yuhao1,Wang Yu2,Cao Lei3,Qiao Lianpeng1,Wang Yuping1,Xu Jingzhe1,Yan Yizhou4,Madden Samuel5

Affiliation:

1. Beijing Institute of Technology

2. University of California

3. University of Arizona/MIT

4. Worcester Polytechnic Institute

5. MIT

Abstract

Outlier detection is crucial for preventing financial fraud, network intrusions, and device failures. Users often expect systems to automatically summarize and interpret outlier detection results to reduce human effort and convert outliers into actionable insights. However, existing methods fail to effectively assist users in identifying the root causes of outliers, as they only pinpoint data attributes without considering outliers in the same subspace may have different causes. To fill this gap, we propose STAIR, which learns concise and human-understandable rules to summarize and explain outlier detection results with finer granularity. These rules consider both attributes and associated values. STAIR employs an interpretation-aware optimization objective to generate a small number of rules with minimal complexity for strong interpretability. The learning algorithm of STAIR produces a rule set by iteratively splitting the large rules and is optimal in maximizing this objective in each iteration. Moreover, to effectively handle high dimensional, highly complex data sets that are hard to summarize with simple rules, we propose a localized STAIR approach, called L-STAIR. Taking data locality into consideration, it simultaneously partitions data and learns a set of localized rules for each partition. Our experimental study on many outlier benchmark datasets shows that STAIR significantly reduces the complexity of the rules required to summarize the outlier detection results, thus more amenable for humans to understand and evaluate.

Publisher

Association for Computing Machinery (ACM)

Reference70 articles.

1. 1993. Mammography. https://www.kaggle.com/datasets/kmader/mias-mammography.

2. 1993. Satimage-2. https://odds.cs.stonybrook.edu/satimage-2-dataset/.

3. 1995. PageBlock. https://archive.ics.uci.edu/dataset/78/page+blocks+classification.

4. 1998. Covertype. https://archive.ics.uci.edu/dataset/31/covertype.

5. 1999. Spambase. http://archive.ics.uci.edu/dataset/94/spambase.

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3