Abstract
Logical Analysis of Data is a procedure aimed at identifying relevant features in data sets with both positive and negative samples. The goal is to build Boolean formulas, represented by strings over {0,1,-} called patterns, which can be used to classify new samples as positive or negative. Since a data set can be explained in alternative ways, many computational problems arise related to the choice of a particular set of patterns. In this paper we study the computational complexity of several of these pattern problems (showing that they are, in general, computationally hard) and we propose some integer programming models that appear to be effective. We describe an ILP model for finding the minimum-size set of patterns explaining a given set of samples and another one for the problem of determining whether two sets of patterns are equivalent, i.e., they explain exactly the same samples. We base our first model on a polynomial procedure that computes all patterns compatible with a given set of samples. Computational experiments substantiate the effectiveness of our models on fairly large instances. Finally, we conjecture that the existence of an effective ILP model for finding a minimum-size set of patterns equivalent to a given set of patterns is unlikely, due to the problem being NP-hard and co-NP-hard at the same time.
Subject
Computational Mathematics,Computational Theory and Mathematics,Numerical Analysis,Theoretical Computer Science
Reference18 articles.
1. Data Mining: Concepts and Techniques;Jaiwei,2011
2. Data Mining: Concepts, Models, Methods, and Algorithms;Kantardzic,2003
3. Feature Selection for Classification
4. Feature Selection for Data Mining;Felici,2006
5. Advances in Feature Selection for Data and Pattern Recognition: An Introduction Advances in Feature Selection for Data and Pattern Recognition;Stanczyk,2018
Cited by
6 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Supervised Classification Problem: Searching for Maximum Patterns;2024 X International Conference on Information Technology and Nanotechnology (ITNT);2024-05-20
2. Greedy algorithm for finding Pareto optimal patterns;AIP Conference Proceedings;2024
3. An Efficient Algorithm for K-Diagnosability Analysis of Bounded and Unbounded Petri Nets;IFAC-PapersOnLine;2024
4. Paired Patterns in Logical Analysis of Data for Decision Support in Recognition;Computation;2022-10-12
5. Efficient Process Scheduling for Multi-core Systems;2022 IEEE 8th Intl Conference on Big Data Security on Cloud (BigDataSecurity), IEEE Intl Conference on High Performance and Smart Computing, (HPSC) and IEEE Intl Conference on Intelligent Data and Security (IDS);2022-05