Abstract
Abstract
Measurement science is particularly well equipped not only to meet reproducibility challenges arising within the field of metrology, but also to suggest strategies and best practices for how such challenges can be met in other fields. This contribution illustrates three such challenges, in three different fields, and proposes ways to address them that can supplement the only way in which reproducibility challenges in science can be resolved definitively: by validated scientific advances that point toward the truth. The first example concerns a large interlaboratory, international comparison of the measurement of the mass fraction of silica in a granite reference material, using classical methods of wet analytical chemistry, carried out in the 1940s. The results delivered a shock to analysts worldwide about the state of the art at the time. The challenge was magnified by the fact that none of the measured values was qualified with an evaluation of measurement uncertainty. We offer an approach developed by Andrew Rukhin from NIST, for how to compute a meaningful consensus value in such case, and explain how the associated uncertainty can be characterized. The second example is about the currently hot topic of the Hubble tension, which refers to the mutual inconsistency of the measurement results, obtained by different methods, for the Hubble-Lemaître constant, which expresses the rate of expansion of the Universe. We suggest that such tension can be quantified in terms of the dark uncertainty that figures as a parameter in a laboratory random effects model, thus providing an objective metric whereby progress toward resolving such tension can be gauged. The third example discusses two sources of lack of reproducibility: on the one hand, the fact that different laboratories produced strikingly discrepant values for the mass fraction of arsenic in kudzu; on the other hand, that different models can be fitted to these data, each producing its own set of results. Here we use a Bayesian model selection criterion to choose one from among four models that are natural candidates to address this double reproducibility challenge. This third example also affords us the opportunity to deflate two widespread myths: that one needs at least four observations to obtain a Bayesian evaluation of standard uncertainty, and that sample standard deviations of small samples are systematically too small.
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献