Author:
Blomberg Luciano C.,Ruiz Duncan Dubugras A.
Abstract
This paper presents an analysis regarding the influence of missing data on datasets when submitted to traditional classification algorithms in data mining applications. For this purpose, we use ten UCI datasets and manipulate them to hold controlled levels of missing data. Our empirical analysis shows that the classification performance decreases after significant insertion of missing values in all datasets tested. Among the analyzed algorithms, Naïve Bayes is the least influenced by missing data, being SMO the next. IBK is the most influenced, presenting the lowest accuracy, predominantly in datasets whose independent variables are continuous.
Publisher
Sociedade Brasileira de Computação
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献