Abstract
The major objective of the paper is to investigate a new probabilistic supervised learning approach that incorporates “missingness” into a decision tree classifier splitting criterion at each particular attribute node in terms of software effort development predictive accuracy. The proposed approach is compared empirically with ten supervised learning methods (classifiers) that have mechanisms for dealing with missing values. 10 industrial datasets are utilized for this task. Overall, missing incorporated in attributes 3 is the top performing strategy, followed by C4.5, missing incorporated in attributes, missing incorporated in attributes 2, missing incorporated in attributes, linear discriminant analysis and so on. Classification and regression trees and C4.5 performed well in data with high correlations among attributes whilek-nearest neighbour and support vector machines performed well in data with higher complexity (limited number of instances). The worst performing method is repeated incremental pruning to produce error reduction.
Publisher
Fuji Technology Press Ltd.
Subject
Artificial Intelligence,Computer Vision and Pattern Recognition,Human-Computer Interaction
Reference59 articles.
1. B. Twala, “Dancing with dirty road traffic accidents data: The case of Gauteng province in South Africa,” J. of Transportation Safety and Security, Vol.4, No.4, pp. 323-335, 2014.
2. P. Winston, “Artificial Intelligence,” Addison-Wesley, 3rd ed. Part II: Learning and Regularity Recognition, 1992.
3. G. H. John, “Robust decision trees: Removing outliers from databases,” Proc. of the 1st Int. Conf. on Knowledge Discovery and Data Mining, pp. 174-179, 1995.
4. A. Kalousis and M. Hilario, “Supervised knowledge discovery from incomplete data,” Proc. of the 2nd Int. Conf. on Data Mining 2000, WIT Press, 2000.
5. G. Batista and M. C. Monard, “An Analysis of Four Missing Data Treatment Methods for Supervised Learning,” Applied Artificial Intelligence, Vol.17, pp. 519-533, 2003.
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献