Ranking earthquake forecasts using proper scoring rules: binary events in a low probability environment-Reference-Cited by-同舟云学术

Ranking earthquake forecasts using proper scoring rules: binary events in a low probability environment

Published:2022-03-28 Issue:2 Volume:230 Page:1419-1440
ISSN:0956-540X
Container-title:Geophysical Journal International
language:en
Short-container-title:

Author:

Serafini Francesco¹^ORCID,Naylor Mark¹^ORCID,Lindgren Finn²,Werner Maximilian J³^ORCID,Main Ian¹

Affiliation:

1. School of Geosciences, University of Edinburgh, Drummond St, Edinburgh EH8 9XP, Uinted Kingdom

2. School of Mathematics, University of Edinburgh, Edinburgh EH9 3FD, United Kingdom

3. School of Earth Sciences, University of Bristol, Bristol BS8 1RL, United Kingdom

Abstract

SUMMARYOperational earthquake forecasting for risk management and communication during seismic sequences depends on our ability to select an optimal forecasting model. To do this, we need to compare the performance of competing models in prospective experiments, and to rank their performance according to the outcome using a fair, reproducible and reliable method, usually in a low-probability environment. The Collaboratory for the Study of Earthquake Predictability conducts prospective earthquake forecasting experiments around the globe. In this framework, it is crucial that the metrics used to rank the competing forecasts are ‘proper’, meaning that, on average, they prefer the data generating model. We prove that the Parimutuel Gambling score, proposed, and in some cases applied, as a metric for comparing probabilistic seismicity forecasts, is in general ‘improper’. In the special case where it is proper, we show it can still be used improperly. We demonstrate the conclusions both analytically and graphically providing a set of simulation based techniques that can be used to assess if a score is proper or not. They only require a data generating model and, at least two forecasts to be compared. We compare the Parimutuel Gambling score’s performance with two commonly used proper scores (the Brier and logarithmic scores) using confidence intervals to account for the uncertainty around the observed score difference. We suggest that using confidence intervals enables a rigorous approach to distinguish between the predictive skills of candidate forecasts, in addition to their rankings. Our analysis shows that the Parimutuel Gambling score is biased, and the direction of the bias depends on the forecasts taking part in the experiment. Our findings suggest the Parimutuel Gambling score should not be used to distinguishing between multiple competing forecasts, and for care to be taken in the case where only two are being compared.

Funder

European Union

Southern California Earthquake Center

NSF

Publisher

Oxford University Press (OUP)

Subject

Geochemistry and Petrology,Geophysics

Link

https://academic.oup.com/gji/advance-article-pdf/doi/10.1093/gji/ggac124/43110909/ggac124.pdf

Reference65 articles.

1. Data-driven optimization of seismicity models using diverse data sets: generation, evaluation, and ranking using Inlabru;Bayliss;J. geophys. Res.,2020