Abstract
The author checks the factors affecting life expectancy by reviewing the literature, and then displays the correlation graph to check the multicollinearity. Second, a training set (70%) and a test set (30%) are created from the dataset collected in this paper. The accuracy of their forecasts is then checked using two different ways—Logistic Regression and KNN before dropping the variable with high correlation with others and slight statistical significance. The accuracy for each model Logit (1), Logit (2), KNN (1) and KNN (2) is 0.8936,0.8723,0.8511 and 0.8723, respectively. The author’s conclusions are as follows: (1) For Logistic Regression Prediction, a lack of information is a major factor that affects accuracy; (2) For KNN Prediction, removing one or more highly linked explanatory variables can improve prediction; (3) Overall, Logistic Regression Prediction has slightly higher accuracy than KNN. Perhaps this is due to the fact that KNN requires a bigger sample size to prevent misclassification, and that the best K are chosen based more on cross-validation experience than the sound statistical theory.