Abstract
1AbstractThe ability to select statistical models based on how well they fit an empirical dataset is a central tenet of modern bioscience. How well this works, though, depends on how goodness-of-fit is measured. Likelihood and its derivatives (e.g. AIC) are popular and powerful tools when measuring goodness-of-fit, though inherently make assumptions about the data. One such assumption is absence of error on the x-axis (i.e. no error in the predictor). This, however, is often not correct and deviations from this assumption are often hard (or impossible) to measure.Here, we show that, when predictor error is present, goodness-of-fit as perceived using likelihood will increase with decreases in sample size, effect size, predictor error and predictor variance. This results in predictors with increased effect size, predictor variance or predictor error being punished. As a consequence, we suggest that larger effect sizes are biased against in likelihood-based model comparison. Of note: (i) this problem is exacerbated in datasets with larger samples sizes and a broader range of predictor values - typically considered desirable biological data collection; and (ii) the magnitude of this effect is non-trivial given that ‘proxy error’ (caused by using correlates of a predictor rather than the predictor itself) can lead to unexpectedly high amounts of error.We investigate the effects of our findings in an empirical dataset of wood anemone (Anemone nemorosa) first flowering date regressed against temperature. Our results show that the proxy error caused by using air temperature rather than ground temperature results in a ∆AIC of around 3. We also demonstrate potential consequences for model selection procedures with autocorrelation (e.g. ‘sliding window’ approaches). Via simulation we show that in the presence of predictor error AIC will favour autocorrelated, lower effect size predictors (such as those found on the edges of predictive windows), rather than thea priorispecified ‘true’ window.Our results suggest significant and far-reaching implications for biological inference with model selection for much of today’s ecology using observational data under non-experimental conditions. We assert that no obvious, globally-applicable solution to this problem exists; and propose that quantifying predictor error is key in accurate ecological model selection going forward.
Publisher
Cold Spring Harbor Laboratory
Reference43 articles.
1. Strong survival selection on seasonal migration versus residence induced by extreme climatic events;Journal of Animal Ecology,2021
2. Drought reshuffles plant phenology and reduces the foraging benefit of green-wave surfing for a migratory ungulate;Global change biology,2020
3. Information theory and an extension of the maximum likelihood principle;Second International Symposium on Information Theory,1976
4. Climwin: An r toolbox for climate window analysis;PloS one,2016
5. Bates, D. , Maechler, M. , Bolker, B. , Walker, S. , Christensen, R. H. B. , Singmann, H. , Dai, B. , Scheipl, F. , Grothendieck, G. , Green, P. , et al. (2009). Package ‘lme4’. URL http://lme4.r-forge.r-project.org.
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献