Outlier Summarization via Human Interpretable Rules-Reference-Cited by-同舟云学术

Outlier Summarization via Human Interpretable Rules

Published:2024-03 Issue:7 Volume:17 Page:1591-1604
ISSN:2150-8097
Container-title:Proceedings of the VLDB Endowment
language:en
Short-container-title:Proc. VLDB Endow.

Author:

Deng Yuhao¹,Wang Yu²,Cao Lei³,Qiao Lianpeng¹,Wang Yuping¹,Xu Jingzhe¹,Yan Yizhou⁴,Madden Samuel⁵

Affiliation:

1. Beijing Institute of Technology

2. University of California

3. University of Arizona/MIT

4. Worcester Polytechnic Institute

5. MIT

Abstract

Outlier detection is crucial for preventing financial fraud, network intrusions, and device failures. Users often expect systems to automatically summarize and interpret outlier detection results to reduce human effort and convert outliers into actionable insights. However, existing methods fail to effectively assist users in identifying the root causes of outliers, as they only pinpoint data attributes without considering outliers in the same subspace may have different causes. To fill this gap, we propose STAIR, which learns concise and human-understandable rules to summarize and explain outlier detection results with finer granularity. These rules consider both attributes and associated values. STAIR employs an interpretation-aware optimization objective to generate a small number of rules with minimal complexity for strong interpretability. The learning algorithm of STAIR produces a rule set by iteratively splitting the large rules and is optimal in maximizing this objective in each iteration. Moreover, to effectively handle high dimensional, highly complex data sets that are hard to summarize with simple rules, we propose a localized STAIR approach, called L-STAIR. Taking data locality into consideration, it simultaneously partitions data and learns a set of localized rules for each partition. Our experimental study on many outlier benchmark datasets shows that STAIR significantly reduces the complexity of the rules required to summarize the outlier detection results, thus more amenable for humans to understand and evaluate.

Publisher

Association for Computing Machinery (ACM)

Link

https://dl.acm.org/doi/pdf/10.14778/3654621.3654627

Reference70 articles.

1. 1993. Mammography. https://www.kaggle.com/datasets/kmader/mias-mammography.

2. 1993. Satimage-2. https://odds.cs.stonybrook.edu/satimage-2-dataset/.

3. 1995. PageBlock. https://archive.ics.uci.edu/dataset/78/page+blocks+classification.

4. 1998. Covertype. https://archive.ics.uci.edu/dataset/31/covertype.

5. 1999. Spambase. http://archive.ics.uci.edu/dataset/94/spambase.