Abstract
Machine learning models can be used in dairy industries for the prediction of milk yield in dairy cattle to increase the efficiency of dairy farms and early culling of animals based on 305 days milk yield. Analysis and evaluation of the performances of Multiple linear regression (MLR), Random forest (RF), Gradient boosting regression (GBR), Extreme gradient boosting (XGboost) and Light gradient boosting (lightGBM) were done on the basis of root mean square errors (RMSE) and coefficient of determination (R2) values. The values of RMSE for MLR, RF, GBR, XGboost and lightGBM for the training period were 478.82, 176.52, 229.65, 271.44 and 214.97 and for the testing period were 469.02, 267.13, 288.10, 338.36 and 293.80, respectively. Similarly, the values of R2 for the training period were 0.76, 0.92, 0.86, 0.81 and 0.88 and for the testing period were 0.55, 0.85, 0.82, 0.76 and 0.82, respectively. The results obtained suggested that the accuracy and precision of RF, LightGBM, GBR and XGboost models were adequate in predicting first lactation 305 days milk yield, but the best results were obtained by RF in both training and testing period; it outperformed other regression models in predicting first lactation 305 days milk yield. Further, an increase in accuracy and precision can be done by increasing the number of independent variables with a high correlation with the dependent variable and by also increasing the number of observations.