A Simulation Study of Sampling in Difficult Settings: Statistical Superiority of Little-Used method

Author:

Shannon Harry S.ORCID,Emond Patrick D.ORCID,Bolker Benjamin M.ORCID,Viveros-Aguilera RománORCID

Abstract

AbstractBackgroundTaking a representative sample to determine prevalence of variables such as disease is difficult when little is known about the target population. Several methods have been proposed that apply cluster sampling techniques to Primary Sampling Units (PSUs). The PSUs are typically towns or census tracts. Some methods are based on random walks within towns, e.g., the original World Health Organization’s Extended Program on Immunization (‘EPI’) surveys and variants, including sampling from four quadrants of each town (‘Quad’). Several major international surveys take random samples from small areas (‘SA’) such as census tracts. Another method uses satellite images and Global Positioning Systems to randomly sample within PSUs from squares in a superimposed grid (‘Square’). We used computer simulations to compare these sampling methods and simple random sampling within towns (‘SRS’) in virtual populations. SRS was our standard, even though it is impractical in low-information settings.MethodsWe constructed 50 virtual populations with varying characteristics, each comprising about a million people spread over 300 towns. The risk of disease for each person varied within and between towns. We created a binary exposure variable and allocated disease statuses to individuals assuming four relative risks (RRs) from exposure. We added three populations with equal risk of disease for every person in the population.For each population, each of the sampling methods – EPI, Quad, SA, Square, and SRS - and each of three sample sizes per PSU (7, 15, and 30), we simulated 1000 samples. For each simulation we estimated the prevalence and RRs. We used the bias and variance of the estimates to calculate the Root Mean Squared Error (RMSE) of these estimates. We ranked the RMSEs of each method and computed the ratio of each method’s RMSE to that of SRS for each population. We computed the mean ranks and ratios across the 50 populations.ResultsApart from SRS, Square had the lowest mean rank of RMSEs for all samples sizes when estimating prevalence. When estimating RR, Square had the lowest mean ranks for samples sizes of 15 and 30 per PSU; for n=7 per PSU, the Quad mean rank was the lowest.The results for the mean ratios of RMSEs showed the same pattern; Square had the lowest values for all sample sizes when estimating prevalence and the lowest values for the two larger sample sizes when estimating RRs. Notably, when estimating prevalence, the ratios increased with sample size per PSU for SA, Quad, and EPI, suggesting those methods did not benefit as much from the larger sample sizes as would be expected from statistical theory.ConclusionsOf several methods that are practical in an imperfectly known population, the Square method was mostly the best, especially for the larger sample sizes. The methods that sample within small areas (Quad, SA, and EPI) do not gain as much statistical benefit as expected from larger sample sizes per PSU, because of some clustering within the areas.

Publisher

Cold Spring Harbor Laboratory

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3