Affiliation:
1. Department of Statistics Qingdao University of Technology Qingdao China
2. Department of Biological Sciences University of Texas at EI Paso EI Paso Texas USA
3. Department of Forestry and Natural Resources Purdue University West Lafayette Indiana USA
4. Department of Statistics and Probability Michigan State University East Lansing Michigan USA
Abstract
AbstractModeling ecological patterns and processes often involve large‐scale and complex high‐dimensional spatial data. Due to the nonlinearity and multicollinearity of ecological data, traditional geostatistical methods have faced great challenges in model accuracy. As machine learning has increased our ability to construct models on big data, the main focus of the study is to propose the use of statistical models that hybridize machine learning and spatial interpolation methods to cope with increasingly large‐scale and complex ecological data. Here, two machine learning algorithms, boosted regression tree (BRT) and least absolute shrinkage and selection operator (LASSO), were combined with ordinary kriging (OK) to model plant invasions across the eastern United States. The accuracies of the hybrid models and conventional models were evaluated by 10‐fold cross‐validation. Based on an invasive plants dataset of 15 ecoregions across the eastern United States, the results showed that the hybrid algorithms were significantly better at predicting plant invasion when compared to commonly used algorithms in terms of RMSE and paired‐samples t‐test (with the p‐value < .0001). Besides, the additional aspect of the combined algorithms is to have the ability to select influential variables associated with the establishment of invasive cover, which cannot be achieved by conventional geostatistics. Higher accuracy in the prediction of large‐scale biological invasions improves our understanding of the ecological conditions that lead to the establishment and spread of plants into novel habitats across spatial scales. The results demonstrate the effectiveness and robustness of the hybrid BRTOK and LASOK that can be used to analyze large‐scale and high‐dimensional spatial datasets, and it has offered an optional source of models for spatial interpolation of ecology properties. It will also provide a better basis for management decisions in early‐detection modeling of invasive species.