BACKGROUND
In this paper, the authors will present practical and accessible weighting and calibration techniques to address the unique nuances of hybrid samples, especially when surveilling the health of hard-to-get cohorts such as teens and young adults. We will start with a comprehensive review of the traditional method of composite estimation. Subsequently, the method of Composite Weighting is introduced that is significantly more efficient, both computationally and inferentially when pooling data from multiple surveys. For empirical illustrations, results from three health risk factor surveillance surveys of teens and young adults will be presented with each survey relying on hybrid samples comprised of probability-based components from the Delivery Sequence File (DSF) of the USPS and supplemental components from online panels.
OBJECTIVE
To offer practical and accessible techniques to address the unique nuances of hybrid samples, especially when surveilling the health of hard-to-get cohorts such as teens and young adults.
METHODS
Mathematical derivations and empirical illustration of the proposed method based on current survey data.
RESULTS
Empirical demonstrate using results from three surveillance surveys of teens and young adults that show the proposed method is more efficient, both computationally and inferentially when pooling data from multiple surveys.
CONCLUSIONS
In comparison to the traditional method of composite estimation whereby separate estimates are combined from different surveys one at a time, our proposed composite weighting methodology for integrating survey data offers at least four distinct advantages:
1. The method of composite weighting is vastly less cumbersome than that of composite estimation because it enables researchers to work with a single data file and not multiple sets of data and weights from unintegrated surveys.
2. An integrated database that is larger than any of the individual sample components accommodates more nuanced weighting adjustments than what might be possible with individuals surveys. This becomes especially appealing when one of the surveys is based on a small sample size, whereby coarse weighting can fail to improve the representation of its respondents.
3. Integrated survey data allow more in-depth analyses, particularly when comparisons of smaller analytical subgroups are of interest. Such deep dive multivariate analyses are not feasible when producing separate estimates from individual surveys, some of which could be of modest size.
4. Lastly, composite weighting eliminates extraneous variabilities that are inevitable under composite estimation due to applications of inconsistent weighting procedures for individual surveys, such as use of different benchmarks, raking algorithms, and weight trimming rules. Moreover, survey estimates from the resulting integrated data will be subject to smaller and consistently calculated standard errors courtesy of the larger combined data.