Affiliation:
1. Department of Applied Mathematics Feng Chia University Taichung City Taiwan
2. Department of Marketing National Chung Hsing University Taichung City Taiwan
Abstract
AbstractMissing values are common, but dealing with them by inappropriate method may lead to large classification errors. Empirical evidences show that the tree‐based classification algorithms such as classification and regression tree (CART) can benefit from imputation, especially multiple imputation. Nevertheless, less attention has been paid to incorporating multiple imputation into cost‐sensitive decision tree induction. This study focuses on the treatment of missing data based on a time‐constrained minimal‐cost tree algorithm. We introduce various approaches to handle incomplete data into the algorithm including complete‐case analysis, missing‐value branch, single imputation, feature acquisition, and multiple imputation. A simulation study under different scenarios examines the predictive performances of the proposed strategies. The simulation results show that the combination of the algorithm with multiple imputation can assure classification accuracy under the budget. A real medical data example provides insights into the problem of missing values in cost‐sensitive learning and the advantages of the proposed methods.