VLSD—An Efficient Subgroup Discovery Algorithm Based on Equivalence Classes and Optimistic Estimate

Author:

Lopez-Martinez-Carrasco Antonio12ORCID,Juarez Jose M.12ORCID,Campos Manuel123ORCID,Canovas-Segura Bernardo12ORCID

Affiliation:

1. MedAI-Lab, University of Murcia, 30100 Murcia, Spain

2. Facultad de Informatica, Campus de Espinardo, Universidad de Murcia, 30100 Murcia, Spain

3. Murcian Bio-Health Institute (IMIB-Arrixaca), 30120 Murcia, Spain

Abstract

Subgroup Discovery (SD) is a supervised data mining technique for identifying a set of relations (subgroups) among attributes from a dataset with respect to a target attribute. Two key components of this technique are (i) the metric used to quantify a subgroup extracted, called quality measure, and (ii) the search strategy used, which determines how the search space is explored and how the subgroups are obtained. The proposal made in this work consists of two parts, (1) a new and efficient SD algorithm which is based on the equivalence class exploration strategy, and which uses a pruning based on optimistic estimate, and (2) a data structure used when implementing the algorithm in order to compute subgroup refinements easily and efficiently. One of the most important advantages of this algorithm is its easy parallelization. We have tested the performance of our SD algorithm with respect to some other well-known state-of-the-art SD algorithms in terms of runtime, max memory usage, subgroups selected, and nodes visited. This was completed using a collection of standard, well-known, and popular datasets obtained from the relevant literature. The results confirmed that our algorithm is more efficient than the other algorithms considered.

Publisher

MDPI AG

Subject

Computational Mathematics,Computational Theory and Mathematics,Numerical Analysis,Theoretical Computer Science

Reference30 articles.

1. Subgroup Discovery—Advanced Review;Atzmueller;WIREs: Data Min. Knowl. Discov.,2015

2. Atzmüller, M., Puppe, F., and Buscher, H.P. (August, January 30). Exploiting Background Knowledge for Knowledge-Intensive Subgroup Discovery. Proceedings of the IJCAI International Joint Conference on Artificial Intelligence, Edinburgh, UK.

3. Expert-Guided Subgroup Discovery: Methodology and Application;Gamberger;J. Artif. Intell. Res.,2002

4. Jorge, A.M., Pereira, F., and Azevedo, P.J. (2006, January 7–10). Visual Interactive Subgroup Discovery with Numerical Properties of Interest. Proceedings of the Discovery Science, Barcelona, Spain.

5. Duivesteijn, W., and Knobbe, A. (2011, January 11–14). Exploiting False Discoveries—Statistical Validation of Patterns and Quality Measures in Subgroup Discovery. Proceedings of the 2011 IEEE 11th International Conference on Data Mining, Vancouver, BC, Canada.

Cited by 2 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Subgroup Discovery with SD4Py;Communications in Computer and Information Science;2024

2. A scalable, distributed framework for significant subgroup discovery;Knowledge-Based Systems;2024-01

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3