Global and saturated probabilistic approximations based on generalized maximal consistent blocks

Author:

Clark Patrick G1,Grzymala-Busse Jerzy W2,Hippe Zdzislaw S3,Mroczek Teresa3,Niemiec Rafal3

Affiliation:

1. Department of Electrical Engineering and Computer Science , University of Kansas, Lawrence, KS 66045, USA

2. Department of Electrical Engineering and Computer Science , University of Kansas, Lawrence, KS 66045, USA and Department of Artificial Intelligence, University of Information Technology and Management, 35-225 Rzeszow, Poland

3. Department of Artificial Intelligence , University of Information Technology and Management, 35-225 Rzeszow, Poland

Abstract

Abstract In this paper incomplete data sets, or data sets with missing attribute values, have three interpretations, lost values, attribute-concept values and ‘do not care’ conditions. Additionally, the process of data mining is based on two types of probabilistic approximations, global and saturated. We present results of experiments on mining incomplete data sets using six approaches, combining three interpretations of missing attribute values with two types of probabilistic approximations. We compare our six approaches, using the error rate computed as a result of ten-fold cross validation as a criterion of quality. We show that for some data sets the error rate is significantly smaller (5% level of significance) for lost values, for some data sets the smaller error rate is associated with attribute-concept values, and sometimes with ‘do not care’ conditions. Again, for some approaches the error rate is significantly smaller for saturated probabilistic approximations than for global probabilistic approximations, while for some approaches it is the other way around. Thus, for an incomplete data set, the best approach to data mining should be chosen by trying all six approaches.

Publisher

Oxford University Press (OUP)

Subject

Logic

Reference33 articles.

1. Characteristic sets and generalized maximal consistent blocks in mining incomplete data;Clark;In Proceedings of the International Joint Conference on Rough Sets,2017

2. Characteristic sets and generalized maximal consistent blocks in mining incomplete data;Clark;Information Sciences,2018

3. A comparison of concept and global probabilistic approximations based on mining incomplete data;Clark,2018

4. Complexity of rule sets in mining incomplete data using characteristic sets and generalized maximal consistent blocks;Clark;Logic Journal of the IGPL,2020

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3