The K-nearest neighbor interpolation method was used to fill in missing data of five indicators of coronary heart disease, diabetes, total cholesterol, triglycerides, and albumin;, and the SMOTE algorithm was used to balance the number of variable indicators. The Relief-F algorithm was used to remove 18 variable indicators and retain 42 variable indicators. LASSO and ridge regression algorithms were used to remove eight variable indicators and retain 52 variable indicators; The prediction accuracy, recall, and AUC values of the linear kernel support vector machine model filtered using Relief-F and LASSO features are high, and the prediction results are optimal; The test result of random forest screened by Relief-F and LASSO features is better than that of the support vector machine model. It is concluded that the random forest model screened by Relief-F features is better as a prediction of lung cancer typing. The research results provide theoretical data support for predicting lung cancer classification using machine learning methods.