Opportunities and illusions of using large samples in statistical inference
Affiliation:
1. Uniwersytet Gdański / University of Gdańsk Wydział Zarządzania / Faculty of Management
Abstract
The theory of statistical inference clearly describes the benefits of large samples. The larger the sample size, the fewer standard errors of the estimated population parameters (the precision of the estimation improves) and the values of the power of statistical tests in hypothesis testing increase. Today’s easy access not only to large samples (e.g. web panels) but also to more advanced and user-friendly statistical software may obscure the potential threats faced by statistical inference based on large samples. Some researchers seem to be under the illusion that large samples can reduce both random errors, typical for any sampling technique, as well as non-random errors. Additionally, the role of a large sample size is an important aspect of the much discussed in the recent years issue of statistical significance (p-value) and the problems related to its determination and interpretation.
The aim of the paper is to present and discuss the consequences of focusing solely on the advantages of large samples and ignoring any threats and challenges they pose to statistical inference. The study shows that a large-size sample collected using one of the non-random sampling techniques cannot be an alternative to random sampling. This particularly applies to online panels of volunteers willing to participate in a survey. The paper also shows that the sampling error may contain a non-random component which should not be regarded as a function of the sample size. As for the contemporary challenges related to testing hypotheses, the study discusses and exemplifies the scientific and ethical aspects of searching for statistical significance using large samples or multiple sampling.
Publisher
Główny Urząd Statystyczny
Subject
General Earth and Planetary Sciences
Reference32 articles.
1. American Association for Public Opinion Research. (2010, czerwiec). AAPOR Report on Online Panels. https://www.aapor.org/Education-Resources/Reports/Report-on-Online-Panels.aspx. 2. Amrhein, V., Greenland, S., McShane, B. (2019, 20 marca). Scientists rise up against statistical significance. https://www.nature.com/articles/d41586-019-00857-9. 3. Amrhein, V., Trafimow, D., Greenland, S. (2019). Inferential Statistics as Descriptive Statistics: There Is No Replication Crisis if We Don’t Expect Replication. The American Statistician, 73(Sup1), 262–270. https://doi.org/10.1080/00031305.2018.1543137. 4. Barnett, V. (1991). Sample Survey. Principles and Methods (2nd edition). Edward Arnold. 5. Benjamin, D. J., Berger, J. O., Johannesson, M., Nosek, B. A., Wagenmakers, E.-J., Berk, R., Bollen, K. A., Brembs, B., Brown, L., Camerer, C., Cesarini, D., Chambers, C. D., Clyde, M., Cook, T. D., De Boeck, P., Dienes, Z., Dreber, A., Easwaran, K., Eferson, C., ... Johnson, V. E. (2018). Redefine statistical significance. Nature Human Behaviour, 2(1), 6–10. https://doi.org/10.1038/s41562-017-0189-z.
|
|