An Efficient Method for Discretizing Continuous Attributes-Reference-Cited by-同舟云学术

An Efficient Method for Discretizing Continuous Attributes

Published:2010-04 Issue:2 Volume:6 Page:1-21
ISSN:1548-3924
Container-title:International Journal of Data Warehousing and Mining
language:en
Short-container-title:

Author:

Engle Kelley M.¹,Gangopadhyay Aryya¹

Affiliation:

1. University of Maryland Baltimore County, USA

Abstract

In this paper the authors present a novel method for finding optimal split points for discretization of continuous attributes. Such a method can be used in many data mining techniques for large databases. The method consists of two major steps. In the first step search space is pruned using a bisecting region method that partitions the search space and returns the point with the highest information gain based on its search. The second step consists of a hill climbing algorithm that starts with the point returned by the first step and greedily searches for an optimal point. The methods were tested using fifteen attributes from two data sets. The results show that the method reduces the number of searches drastically while identifying the optimal or near-optimal split points. On average, there was a 98% reduction in the number of information gain calculations with only 4% reduction in information gain.

Publisher

IGI Global

Subject

Hardware and Architecture,Software

Reference52 articles.

1. Asuncion, A., & Newman, D. J. (2007). UCI machine learning repository. Retrieved from http://www.ics.uci.edu/~mlearn/MLRepository.html

2. An approach to mining crime patterns.;S.Bagui;International Journal of Data Warehousing and Mining,2006

3. Baruch Options Data Warehouse. (2008). Subotnik Financial Services Center. Retrieved

4. Breiman, L., Friedman, J. H., Olshen, R. A., & Stone, C. J. (1998). Megainduction: machine learning on very large databases. Sydney, Australia: University of Sydney.

5. Catlett, J. (1991). On changing continuous attributes into ordered discrete attributes. Paper presented at the Proceedings of the European Working Session on Learning, Berlin, Germany.

Cited by 7 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Survey:Time-series data preprocessing: A survey and an empirical analysis;Journal of Engineering Research;2024-03

2. Data Field for Hierarchical Clustering;Developments in Data Extraction, Management, and Analysis;2013

3. Spatial Data Mining for Highlighting Hotspots in Personal Navigation Routes;International Journal of Data Warehousing and Mining;2012-07

4. Classifying Very High-Dimensional Data with Random Forests Built from Small Subspaces;International Journal of Data Warehousing and Mining;2012-04

5. Mining Hierarchical Negative Association Rules;International Journal of Computational Intelligence Systems;2012