Global and saturated probabilistic approximations based on generalized maximal consistent blocks-Reference-Cited by-同舟云学术

Global and saturated probabilistic approximations based on generalized maximal consistent blocks

Published:2022-02-21 Issue:2 Volume:31 Page:223-239
ISSN:1367-0751
Container-title:Logic Journal of the IGPL
language:en
Short-container-title:

Author:

Clark Patrick G¹,Grzymala-Busse Jerzy W²,Hippe Zdzislaw S³,Mroczek Teresa³,Niemiec Rafal³

Affiliation:

1. Department of Electrical Engineering and Computer Science , University of Kansas, Lawrence, KS 66045, USA

2. Department of Electrical Engineering and Computer Science , University of Kansas, Lawrence, KS 66045, USA and Department of Artificial Intelligence, University of Information Technology and Management, 35-225 Rzeszow, Poland

3. Department of Artificial Intelligence , University of Information Technology and Management, 35-225 Rzeszow, Poland

Abstract

Abstract In this paper incomplete data sets, or data sets with missing attribute values, have three interpretations, lost values, attribute-concept values and ‘do not care’ conditions. Additionally, the process of data mining is based on two types of probabilistic approximations, global and saturated. We present results of experiments on mining incomplete data sets using six approaches, combining three interpretations of missing attribute values with two types of probabilistic approximations. We compare our six approaches, using the error rate computed as a result of ten-fold cross validation as a criterion of quality. We show that for some data sets the error rate is significantly smaller (5% level of significance) for lost values, for some data sets the smaller error rate is associated with attribute-concept values, and sometimes with ‘do not care’ conditions. Again, for some approaches the error rate is significantly smaller for saturated probabilistic approximations than for global probabilistic approximations, while for some approaches it is the other way around. Thus, for an incomplete data set, the best approach to data mining should be chosen by trying all six approaches.

Publisher

Oxford University Press (OUP)

Subject

Logic

Link

https://academic.oup.com/jigpal/article-pdf/31/2/223/49705956/jzac015.pdf

Reference33 articles.

1. Characteristic sets and generalized maximal consistent blocks in mining incomplete data;Clark;In Proceedings of the International Joint Conference on Rough Sets,2017

2. Characteristic sets and generalized maximal consistent blocks in mining incomplete data;Clark;Information Sciences,2018

3. A comparison of concept and global probabilistic approximations based on mining incomplete data;Clark,2018

4. Complexity of rule sets in mining incomplete data using characteristic sets and generalized maximal consistent blocks;Clark;Logic Journal of the IGPL,2020