Empirical Comparison of Various Discretization Procedures-Reference-Cited by-同舟云学术

Empirical Comparison of Various Discretization Procedures

Published:1998-11 Issue:07 Volume:12 Page:1017-1032
ISSN:0218-0014
Container-title:International Journal of Pattern Recognition and Artificial Intelligence
language:en
Short-container-title:Int. J. Patt. Recogn. Artif. Intell.

Author:

Berka Petr¹,Bruha Ivan²

Affiliation:

1. Laboratory of Intelligent Systems, Prague University of Economics, W. Churchill Sq. 4, Prague CZ-l13067, Republic of Czech

2. Department of Computer Science and Systems, McMaster University, Hamilton, Ont., Canada L8S4K1, Canada

Abstract

The genuine symbolic machine learning (ML) algorithms are capable of processing symbolic, categorial data only. However, real-world problems, e.g. in medicine or finance, involve both symbolic and numerical attributes. Therefore, there is an important issue of ML to discretize (categorize) numerical attributes. There exist quite a few discretization procedures in the ML field. This paper describes two newer algorithms for categorization (discretization) of numerical attributes. The first one is implemented in the KEX (Knowledge EXplorer) as its preprocessing procedure. Its idea is to discretize the numerical attributes in such a way that the resulting categorization corresponds to KEX knowledge acquisition algorithm. Since the categorization for KEX is done "off-line" before using the KEX machine learning algorithm, it can be used as a preprocessing step for other machine learning algorithms, too. The other discretization procedure is implemented in CN4, a large extension of the well-known CN2 machine learning algorithm. The range of numerical attributes is divided into intervals that may form a complex generated by the algorithm as a part of the class description. Experimental results show a comparison of performance of KEX and CN4 on some well-known ML databases. To make the comparison more exhibitory, we also used the discretization procedure of the MLC++ library. Other ML algorithms such as ID3 and C4.5 were run under our experiments, too. Then, the results are compared and discussed.

Publisher

World Scientific Pub Co Pte Lt

Subject

Artificial Intelligence,Computer Vision and Pattern Recognition,Software

Link

https://www.worldscientific.com/doi/pdf/10.1142/S0218001498000567

Reference8 articles.

1. A method of choosing multiway partitions for classification and decision trees

Cited by 13 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. USING THE LISP-MINER SYSTEM FOR CREDIT RISK ASSESSMENT;Neural Network World;2016

2. Discretizing Numerical Attributes in Decision Tree for Big Data Analysis;2014 IEEE International Conference on Data Mining Workshop;2014-12

3. Discretization;Intelligent Systems Reference Library;2014-08-31

4. Data accuracy's impact on segmentation performance: Benchmarking RFM analysis, logistic regression, and decision trees;Journal of Business Research;2014-01

5. A Survey of Discretization Techniques: Taxonomy and Empirical Analysis in Supervised Learning;IEEE Transactions on Knowledge and Data Engineering;2013-04