Abstract
Abstract
Small datasets comprising observations made under conditions of repeatability or of reproducibility pervade the practice of measurement science. Many laboratories typically will make only one determination, occasionally they will make two, and only rarely will they make three or more replicate determinations of the same measurand. Interlaboratory comparisons, including key comparisons, and meta-analyses, often involve only a handful of participants. These limitations pose considerable challenges to the production of reliable uncertainty evaluations. This contribution, intended for metrologists, describes techniques that may be employed to address this challenge either when the only information in hand is what those few observations provide, or when there also is preexisting knowledge about the measurement procedure or about the measurand. Although the technical details vary, the key message is persistently the same: that there is no universal solution to the challenges raised by small datasets, and that if a measurand is worth measuring, then the observations deserve a customized treatment responsive to the peculiarities of the case, and a level of effort sufficient to render the final result fit for its intended purpose. The focus is on the measurement of scalar measurands, similarly to the Guide to the Expression of Uncertainty in Measurement (GUM), but the range of measurement models considered is much wider than the GUM entertains. We review the advantages of the Hodges–Lehmann estimator, as a general purpose replacement for the arithmetic average, in all cases where the replicated observations are approximately symmetrically distributed around a central, typical value. We illustrate the application of empirical Bayes methods to uncertainty evaluations, in particular in the context of data reductions of small data sets. Metrologists who are skeptical about the use of subjective prior distributions may derive some value from this novel application, and thereby develop an appreciation for how Bayesian procedures can help address the challenges posed by small datasets. The estimates of the measurand that different approaches produce often agree, at least approximately, but the corresponding uncertainty quantifications may differ markedly. In one example, involving three observations, a Bayesian approach yields a coverage interval appreciably narrower than the GUM’s approach. In another example, involving only two observations, an approach involving far less restrictive assumptions than those made in the GUM, produces a confidence interval that is almost as narrow as the conventional interval.