Computational Complexity and ILP Models for Pattern Problems in the Logical Analysis of Data

Author:

Lancia GiuseppeORCID,Serafini Paolo

Abstract

Logical Analysis of Data is a procedure aimed at identifying relevant features in data sets with both positive and negative samples. The goal is to build Boolean formulas, represented by strings over {0,1,-} called patterns, which can be used to classify new samples as positive or negative. Since a data set can be explained in alternative ways, many computational problems arise related to the choice of a particular set of patterns. In this paper we study the computational complexity of several of these pattern problems (showing that they are, in general, computationally hard) and we propose some integer programming models that appear to be effective. We describe an ILP model for finding the minimum-size set of patterns explaining a given set of samples and another one for the problem of determining whether two sets of patterns are equivalent, i.e., they explain exactly the same samples. We base our first model on a polynomial procedure that computes all patterns compatible with a given set of samples. Computational experiments substantiate the effectiveness of our models on fairly large instances. Finally, we conjecture that the existence of an effective ILP model for finding a minimum-size set of patterns equivalent to a given set of patterns is unlikely, due to the problem being NP-hard and co-NP-hard at the same time.

Publisher

MDPI AG

Subject

Computational Mathematics,Computational Theory and Mathematics,Numerical Analysis,Theoretical Computer Science

Reference18 articles.

1. Data Mining: Concepts and Techniques;Jaiwei,2011

2. Data Mining: Concepts, Models, Methods, and Algorithms;Kantardzic,2003

3. Feature Selection for Classification

4. Feature Selection for Data Mining;Felici,2006

5. Advances in Feature Selection for Data and Pattern Recognition: An Introduction Advances in Feature Selection for Data and Pattern Recognition;Stanczyk,2018

Cited by 6 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Supervised Classification Problem: Searching for Maximum Patterns;2024 X International Conference on Information Technology and Nanotechnology (ITNT);2024-05-20

2. Greedy algorithm for finding Pareto optimal patterns;AIP Conference Proceedings;2024

3. An Efficient Algorithm for K-Diagnosability Analysis of Bounded and Unbounded Petri Nets;IFAC-PapersOnLine;2024

4. Paired Patterns in Logical Analysis of Data for Decision Support in Recognition;Computation;2022-10-12

5. Efficient Process Scheduling for Multi-core Systems;2022 IEEE 8th Intl Conference on Big Data Security on Cloud (BigDataSecurity), IEEE Intl Conference on High Performance and Smart Computing, (HPSC) and IEEE Intl Conference on Intelligent Data and Security (IDS);2022-05

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3